GitHub - timoklein/redo: ReDo: The Dormant Neuron Phenomenon in Deep Reinforcement Learning (pytorch)

Recycling dormant neurons

Pytorch reimplementation of ReDo (The Dormant Neuron Phenomenon in Deep Reinforcement Learning). The paper establishes the dormant neuron phenomenon, where over the course of training a network with nonstationary targets, a significant portion of the neurons in a deep network become dormant, i.e. their activations become minimal to compared the other neurons in the layer. This phenomenon is particularly prevalent in value-based deep reinforcement learning algorithms, such as DQN and its variants. As a solution, the authors propose to periodically check for dormant neurons and reinitialize them.

Dormant neurons

The score $s_i^{\ell}$ of a neuron $i$ in layer $l$ is defined as the absolute value of its activation $\mathbb{E}_{x \in D} |h_i^{\ell}(x)|$ divided by the normalized average of absolute activations within the layer $\frac{1}{H^{\ell}} \sum_{k \in h} \mathbb{E}_{x \in D}|h_k^{\ell}(x)|$:

$$s_i^{\ell}=\frac{\mathbb{E}_{x \in D}|h_i^{\ell}(x)|}{\frac{1}{H^{\ell}} \sum_{k \in h} \mathbb{E}_{x \in D}|h_k^{\ell}(x)|}$$

A neuron is defined as $\tau$-dormant when $s_i^{\ell} \leq \tau$.

ReDo

Every $F$-th time step:

Check whether a neuron $i$ is $\tau$-dormant.
If a neuron $i$ is $\tau$-dormant:
Re-initialize input weights and bias of $i$.
Set the outgoing weights of $i$ to $0~.$

Results

These results were generated using 3 seeds. Note I was not using typical hyperparameters for DQN, but instead chose a hyperparameter set to exaggerate the dormant neuron phenomenon.
In particular:

Updates are done every environment step instead of every 4 steps.
Target network updates every 2000 steps instead of every 8000.
Fewer random samples before learing starts.
$\tau=0.1$ instead of $\tau=0.025$.

Episodic Return

Dormant count $\tau=0.0$

Dormant count $\tau=0.1$

I've skipped running 10M or 100M experiments because these are very expensive in terms of compute.

Implementation progress

Update 1:
Fixed and simplified the for-loop in the redo resets.

Udpate 2: The reset-check in the main function was on the wrong level and the re-initializations are now properly done in-place and work.

Update 3: Adam moment step-count reset is crucial for performance. Else the Adam updates will immediately create dead neurons again.
Preliminary results now look promising.

Update 4: Fixed the outgoing weight resets where the mask was generated wrongly and not applied to the outgoing weights. See this issue. Thanks @SaminYeasar!

Citations

Paper:

@inproceedings{sokar2023dormant,
  title={The dormant neuron phenomenon in deep reinforcement learning},
  author={Sokar, Ghada and Agarwal, Rishabh and Castro, Pablo Samuel and Evci, Utku},
  booktitle={International Conference on Machine Learning},
  pages={32145--32168},
  year={2023},
  organization={PMLR}
}

Training code is based on cleanRL:

@article{huang2022cleanrl,
  author  = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo},
  title   = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {274},
  pages   = {1--18},
  url     = {http://jmlr.org/papers/v23/21-1342.html}
}

Replay buffer and wrappers are from Stable Baselines 3:

@misc{raffin2019stable,
  title={Stable baselines3},
  author={Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
img		img
src		src
.gitignore		.gitignore
README.md		README.md
env.yaml		env.yaml
redo.sh		redo.sh
redo_dqn.py		redo_dqn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recycling dormant neurons

Dormant neurons

ReDo

Results

Episodic Return

Dormant count $\tau=0.0$

Dormant count $\tau=0.1$

Implementation progress

Citations

About

Releases

Packages

Languages

timoklein/redo

Folders and files

Latest commit

History

Repository files navigation

Recycling dormant neurons

Dormant neurons

ReDo

Results

Episodic Return

Dormant count $\tau=0.0$

Dormant count $\tau=0.1$

Implementation progress

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages