Skip to content

v0.8.0

Latest
Compare
Choose a tag to compare
@muupan muupan released this 14 Feb 05:32
· 45 commits to master since this release
499b798

Announcement

This release will probably be the final major update under the name of ChainerRL. The development team is planning to switch its backend from Chainer to PyTorch and continue its development as OSS.

Important enhancements

Important bugfixes

  • The bug that the update of CategoricalDoubleDQN is same as that of CategoricalDQN is fixed.
  • The bug that batch training with N-step or episodic replay buffers does not work is fixed.
  • The bug that weight normalization is PrioritizedReplayBuffer with normalize_by_max == 'batch' is wrong is fixed.

Important destructive changes

All updates

Enhancements

  • Recurrent DQN families with a new interface (#436)
  • Recurrent and batched TRPO (#446)
  • Add Soft Actor-Critic agent (#457)
  • Code to collect demonstrations from an agent. (#468)
  • Monitor with ContinuingTimeLimit support (#491)
  • Fix B007: Loop control variable not used within the loop body (#502)
  • Double IQN (#503)
  • Fix B006: Do not use mutable data structures for argument defaults. (#504)
  • Splits Replay Buffers into separate files in a replay_buffers module (#506)
  • Use chainer.grad in ACER (#511)
  • Prioritized Double IQN (#518)
  • Add policy loss to TD3's logged statistics (#524)
  • Adds checkpoint frequencies for serial and batch Agents. (#525)
  • Add a deterministic mode to IQN for stable tests (#529)
  • Use Link.cleargrads instead of Link.zerograds in REINFORCE (#536)
  • Use cupyx.scatter_add instead of cupy.scatter_add (#537)
  • Avoid cupy.zeros_like with numpy.ndrray (#538)
  • Use get_device_from_id since get_device is deprecated (#539)
  • Releases trained models for all reproduced agents (#565)

Documentation

  • Typo fix in Replay Buffer Docs (#507)
  • Fixes typo in docstring for AsyncEvaluator (#508)
  • Improve the algorithm list on README (#509)
  • Add Explorers to Documentation (#514)
  • Fixes syntax errors in ReplayBuffer docs. (#515)
  • Adds policies to the documentation (#516)
  • Adds demonstration collection to experiments docs (#517)
  • Adds List of Batch Agents to the README (#543)
  • Add documentation for Q-functions and some missing details in docstrings (#556)
  • Add comment on environment version difference (#582)
  • Adds ChainerRL Bibtex to the README (#584)
  • Minor Typo Fix (#585)

Examples

  • Rename examples directories (#487)
  • Adds training times for reproduced Mujoco results (#497)
  • Adds additional information to Grasping Example README (#501)
  • Fixes a comment in PPO example (#521)
  • Rainbow Scores (#546)
  • Update train_a3c.py (#547, thanks @xinyuewang1!)
  • Update train_a3c.py (#548, thanks @xinyuewang1!)
  • Improves formatting of IQN training times (#549)
  • Corrects Scores in Examples (#552)
  • Removes GPU option from README (#564)
  • Releases trained models for all reproduced agents (#565)
  • Add an example script for RoboschoolAtlasForwardWalk-v1 (#577)
  • Corrects Rainbow Results (#580)
  • Adds proper A3C scores (#581)

Testing

  • Add CI configs (#478)
  • Specify ubuntu 16.04 for Travis CI and modify a dependency accordingly (#520)
  • Remove a tailing space of DoubleIQN (#526)
  • Add a deterministic mode to IQN for stable tests (#529)
  • Fix import error when chainer==7.0.0b3 (#531)
  • Make test_monitor.py work on flexCI (#533)
  • Improve parameter distributions used in TestGaussianDistribution (#540)
  • Increase flexCI's time limit to 20min (#550)
  • decrease amount of decimal digits required to 4 (#554)
  • Use attrs<19.2.0 with pytest (#569)
  • Run slow tests with flexCI (#575)
  • Typo fix in CI comment. (#576)
  • Adds time to DDPG Tests (#587)
  • Fix CI errors due to pyglet, zipp, mock, and gym (#592)

Bugfixes

  • Fix a bug in batch_recurrent_experiences regarding next_action (#528)
  • Fix ValueError in SARSA with GPU (#534)
  • fix function call (#541)
  • Pass env_id to replay_buffer methods to fix batch training (#558)
  • Fixes Categorical Double DQN Error. (#567)
  • Fix weight normalization inside prioritized experience replay (#570)