-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fixups #14
base: master
Are you sure you want to change the base?
[WIP] Fixups #14
Conversation
Hi, Thanks for posting this! I was wondering if you were going to push a updated version? I would be intersted to know if running on GPU substantially increases performance for Atari. Also, is running 36 processes is always better than running 16, I don't recall the paper addressing this. (If you have any ideas for how to improve the code that you haven't had time to try yourself I would be interested in these.) BR, Max |
Thanks for nice fixes! I'll merge it after checking.
Yes, I have had some refactoring and implemented training for gym environments and continuous tasks (not so successful so far), but I don't have enough time to push them.
I'm also interested in it, but using GPU would be tricky for my multi-process implementation. I suppose there's no way to share GPU memory among different processes.
I didn't compare scores of 16 vs 36. My implementation is apparently slower than DeepMind's, and in order to complete the same number of training steps in one day I needed to use more processes. |
In general we want to have the largest possible diversity between the processes, to prevent learning from degenerating. Randomness in present in the environment through the random seed of the atari emulator and the number of no-ops at the beginning of the game. It is present in the model through the sampling of discrete actions. This patch makes sure there is a training level random seed, which is saved to the args.txt file ( even if it has been generated ). This seed is in turn used to create process level random seeds, which are used for both the environment and the model. The enviroment random seed is used for the emulator too.
Hi @muupan, I updated the PR to fix the fact that all processes were starting with the same random number ( and a bit more). BR, Max |
This is a collection of my fixups.