You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[Dueling Network Architecture for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581)
11
11
*[Deep Recurrent Q-Learning for Partially Observable MDPs](https://arxiv.org/abs/1507.06527)
12
+
*[Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295)
12
13
13
14
14
15
@@ -381,3 +382,46 @@ The graph of average score is as follows.
381
382
382
383
<br> The `average testing score is 18.22`
383
384
385
+
---
386
+
387
+
## NoisyNet Deep Q Network
388
+
389
+
I studied `Noisy Deep Q Network` with the paper [Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295).
390
+
391
+
This algorithm is a deep reinforcement learning agent with parametric noise added to its weights. The parameters of the noise are learned with gradient descent along with the remaining network weights.
392
+
393
+
- NoisyNet learned perturbations of the network weights are used to drive exploration.
394
+
- Noise is added to the policy at every step.
395
+
- The perturbations are sampled from a noise distribution.
396
+
- The variance of the perturbations is a parameter that can be considered as the energy of the injected noise.
397
+
- Variance parameters are learned using gradients from the reinforcement learning loss function.
398
+
- Epsilon greedy is no longer used, but instead the policy greedily optimises the value function.
399
+
400
+
401
+
402
+
In the paper, description of the NoisyNet is as follows. It replaces the linear layers by noisy layers.
There are more variables for NoisyNet (`mu` and `sigma`). Therefore, the loss function of algorithms is also changed. DQN loss becomes the NoisyNet-DQN loss as follows.
Outer expectation is with respect to distribution of the noise variables *epsilon* for the noisy value function Q(x, a, epsilon; zeta) and the noise variable *epsilon`* of the noisy target value function Q(y, b, epsilon`;target zeta).
0 commit comments