Skip to content

Commit a5cd4af

Browse files
committed
Update readme about NoisyNet DQN
1 parent cf2f9ed commit a5cd4af

File tree

4 files changed

+44
-0
lines changed

4 files changed

+44
-0
lines changed

Image/NoisyNet_Algorithm.PNG

172 KB
Loading

Image/NoisyNet_Description.PNG

280 KB
Loading

Image/NoisyNet_Loss.PNG

13 KB
Loading

README.md

+44
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ This repository is the codes for `Deep Reinforcement Learning`
99
* [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)
1010
* [Dueling Network Architecture for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581)
1111
* [Deep Recurrent Q-Learning for Partially Observable MDPs](https://arxiv.org/abs/1507.06527)
12+
* [Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295)
1213

1314

1415

@@ -381,3 +382,46 @@ The graph of average score is as follows.
381382

382383
<br> The `average testing score is 18.22`
383384

385+
---
386+
387+
## NoisyNet Deep Q Network
388+
389+
I studied `Noisy Deep Q Network` with the paper [Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295).
390+
391+
This algorithm is a deep reinforcement learning agent with parametric noise added to its weights. The parameters of the noise are learned with gradient descent along with the remaining network weights.
392+
393+
- NoisyNet learned perturbations of the network weights are used to drive exploration.
394+
- Noise is added to the policy at every step.
395+
- The perturbations are sampled from a noise distribution.
396+
- The variance of the perturbations is a parameter that can be considered as the energy of the injected noise.
397+
- Variance parameters are learned using gradients from the reinforcement learning loss function.
398+
- Epsilon greedy is no longer used, but instead the policy greedily optimises the value function.
399+
400+
401+
402+
In the paper, description of the NoisyNet is as follows. It replaces the linear layers by noisy layers.
403+
404+
<img src="./Image/NoisyNet_Description.PNG" alt="NoisyNet_Description" />
405+
406+
407+
408+
There are more variables for NoisyNet (`mu` and `sigma`). Therefore, the loss function of algorithms is also changed. DQN loss becomes the NoisyNet-DQN loss as follows.
409+
410+
<img src="./Image/NoisyNet_Loss.PNG" width="500" alt="Plot Dueling-DQN" />
411+
412+
Outer expectation is with respect to distribution of the noise variables *epsilon* for the noisy value function Q(x, a, epsilon; zeta) and the noise variable *epsilon`* of the noisy target value function Q(y, b, epsilon`;target zeta).
413+
414+
415+
416+
The algorithm from the paper is as follows.
417+
418+
<img src="./Image/NoisyNet_Algorithm.PNG" width="500" alt="Plot Dueling-DQN" />
419+
420+
<br> I verified the algorithm with the game `breakout`.
421+
422+
The graph of average score is as follows.
423+
424+
<img src="./Plot/2017-10-16_11_20_Noisy_DQN_breakout47.7882352941.png" width="500" alt="Plot NoisyNet-DQN" />
425+
426+
<br> The `average testing score is 47.79!!` wow! :surprise:
427+

0 commit comments

Comments
 (0)