fix: reworked frame skipping and max-pooling for Atari #20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pulled frame skipping out of the gymnasium environment to perform max-pooling of consecutive frames as performed in dqn zoo codebase. The agent's stream of experience should now follow the pipeline described below:
In every step of environment the Atari simulator takes 4 steps by repeating the selected action. This simplifies the RL problem and speeds up execution. If the agent loses a life or the episode terminates, the frame skipping loop ends early and the environments discount factor is set to 0. The agent receives the total reward obtained during frame skipping loop. Consecutive observations are max-pooled to handle screen flickering due to Atari2600's hardware limitations. After max-pooling the frames are resized to
(84, 84)
and turned grayscale. At each step, the agent receives a stack of past 4 observed (not skipped) processed frames (observation shape(84, 84, 4)
). In the below diagram~
denotes skipped frames, small letters denote max pooled frames (e.g.b = max pool(3, 4)
), and capital letters denote max pooled frames after resizing and turning into grayscale (e.g.C = max pool(7, 8)
).