You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ch8/rl/README.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -58,11 +58,11 @@ Now the situation is reversed: the `Rew` layer shows that the reward has been pr
58
58
59
59
* Do `Step Trial` to process the rest of the trial, and switch to viewing `Train Trial Plot`.
60
60
61
-
The plot shows that the "dopamine spike" of TD delta has moved forward one step in time. This is the critical feature of the TD algorithm: by learning to anticipate rewards one time step later, it ends up moving the dopamine spike earlier in time.
61
+
The plot shows that the "dopamine spike" of TD delta has moved backward (earlier) one step in time. This is the critical feature of the TD algorithm: by learning to anticipate rewards one time step later, it ends up moving the dopamine spike earlier in time.
62
62
63
63
* Keep doing more `Step Trial` (or just `Train`).
64
64
65
-
You should see that the spike moves "forward" in time with each `Step Trial`, but can't move any further than the onset of the CS at time step 10.
65
+
You should see that the spike moves "backward" in time with each `Step Trial`, but can't move any further than the onset of the CS at time step 10.
66
66
67
67
We can also examine the weights to see what the network has learned.
0 commit comments