Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fixups #14

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

[WIP] Fixups #14

wants to merge 4 commits into from

Conversation

BlGene
Copy link

@BlGene BlGene commented Jun 23, 2016

This is a collection of my fixups.

@BlGene
Copy link
Author

BlGene commented Jun 23, 2016

Hi,

Thanks for posting this!

I was wondering if you were going to push a updated version? I would be intersted to know if running on GPU substantially increases performance for Atari. Also, is running 36 processes is always better than running 16, I don't recall the paper addressing this.

(If you have any ideas for how to improve the code that you haven't had time to try yourself I would be interested in these.)

BR, Max

@muupan
Copy link
Owner

muupan commented Jun 24, 2016

Thanks for nice fixes! I'll merge it after checking.

I was wondering if you were going to push a updated version?

Yes, I have had some refactoring and implemented training for gym environments and continuous tasks (not so successful so far), but I don't have enough time to push them.

I would be intersted to know if running on GPU substantially increases performance for Atari

I'm also interested in it, but using GPU would be tricky for my multi-process implementation. I suppose there's no way to share GPU memory among different processes.

Also, is running 36 processes is always better than running 16, I don't recall the paper addressing this.

I didn't compare scores of 16 vs 36. My implementation is apparently slower than DeepMind's, and in order to complete the same number of training steps in one day I needed to use more processes.

In general we want to have the largest possible diversity between the
processes, to prevent learning from degenerating.

Randomness in present in the environment through the random seed of the
 atari emulator and the number of no-ops at the beginning of the game.
It is present in the model through the sampling of discrete actions.

This patch makes sure there is a training level random seed, which is saved
to the args.txt file ( even if it has been generated ). This seed is in
turn used to create process level random seeds, which are used for both the
environment and the model. The enviroment random seed is used for the
emulator too.
@BlGene
Copy link
Author

BlGene commented Jul 8, 2016

Hi @muupan,

I updated the PR to fix the fact that all processes were starting with the same random number ( and a bit more).

BR, Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants