Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Success Rate? #11

Open
dikke opened this issue Aug 5, 2018 · 14 comments
Open

Success Rate? #11

dikke opened this issue Aug 5, 2018 · 14 comments

Comments

@dikke
Copy link

dikke commented Aug 5, 2018

Hello

I am amazed by your work. I am wondering if you tested the Sokoban's game on the standard RL method (Q learning, A2C, ec), and wondering if you have success rate for this kind of game?

@mpSchrader
Copy link
Owner

mpSchrader commented Aug 5, 2018

Hey,

Currently I do not have any reliable success rates my self. I would recommend reading Deep Minds paper about Imagination Augmented Agents. In there they represanted results and compared them to a baseline RL algorithm. The I2A architecture solved over 80% percent. In a very computational expansive configuration they were able to solve over 90%.

During a class project we implemented the architecture, but we were not able to replicate the results due to limited computational power.

@mpSchrader
Copy link
Owner

This weekend I will have a look the openAIs baselines repo and check on how to make gym-sokoban usable in with the repo. After that I will upload some baselines in the documentation.

@Olloxan
Copy link
Contributor

Olloxan commented Aug 23, 2018

Hey, I am currently trying to implement your sokoban environment for the I2A Agent for my Master Thesis. I am using this repo as a starting point. So far parallelizing of sokoban environments seems to work. If you need any hints for your baseline implementation I suggest having a look into the parallel implementation if the pacman environment in the I2A implenentation

@mpSchrader
Copy link
Owner

@Olloxan Thanks for the hint. I was planning to first run the simple subpackages from https://github.com/openai/baselines#subpackages-1 as a very first baseline and later on try more advanced techniques. ;-)

Maybe you could add your results of your I2A implementation. By the way it would be great if you could share the results of your thesis in the end. ;-)

@wrongbattery
Copy link

i found that env.reset() takes at least 14 seconds to create new game, if we play 100k game, it takes at least 400h for training

@mpSchrader
Copy link
Owner

mpSchrader commented Nov 14, 2018 via email

@Olloxan
Copy link
Contributor

Olloxan commented Nov 14, 2018

Hi,
in order to use this environment for the Imagination Augmented agent, I had to scale the size of the tiles down to 8x8 pixels. That is, what they used in their paper. I experienced the same problem with the level generation. The fasted levels where generated in about two soconds, the slowest one took over two minutes. They already stated in their paper that they used an A3C agent for their sokoban task as that solves at least the problem of syncronized level generation. I think, the level generation algorithm is ok as it is, an A2C algorithm is just not suitable for this task. I still implemented a solution where an A2C could be used. I started 16 processes for playing 16 different sokoban games to generate training data and started an additional 16 processes that generated sokoban levels and stored them in a multiprocessing buffer. This buffer served as an asynchronous source for new levels and the number of generating processes was just enough to satisfy the need of new levels.

The problem in general is the lack of computing power. In order for this environment to be learned by a model-free actor-critic network or even the I2A you need at least 32 parallel agents/environments otherwise your network will just overfit as a result of very sparse rewards. For the reasons I stated above, I could not use this sokoban environment for my master thesis as it was just too computationally expensive. I developed a different environment that is computationally very light weight and still offers sparse rewards. But just for comparisen: I train the I2A agent with 1e6 training epochs, which takes around 20 days. I hope, the network converges faster so that i can stop the process earlier. In the paper you can see a significant increase of the learning curve after about 5e8 epochs. They train their network for 1e9 epochs. That is not possible unless you have a datacenter.

So my tip if you want to try the I2A approach with sokoban: build a working A3C, and train it on a system that has at least 28 to 32 cores, otherwise your netowork will overfit. You need asynchronuos level generation otherwise the training takes several months. And get a snickers because the training will still take very long...

@mpSchrader
Copy link
Owner

Hi @Olloxan,

thanks for your insights. Regarding the pixel size did you scaled it down or did you used the tiny_world rendering modes? Do you have some results to share? If so you could add subpage with the current high score. ;-)

This spring I had the chance to attend a lecture by a Deep Mind employee. After the lecture we talked with him about the implementation and the training process of I2A. During the conversation the guest lecturer, who was not part of the I2A team, said that we as students probably won't have the computing resources to train that architecture in a responsable time. Just fyi ;-)

@Olloxan
Copy link
Contributor

Olloxan commented Nov 16, 2018

In order to Use the proposed network structure and make use of the kernels I had to use the pixel version. I just scaled your 16x16 images down to 8x8, that was no problem. I use a 16 core and a 28 core system. That is not bad for the beginning. And I got pretty good results on the maodel-free actor critics. Unfortunately I could not use your sokoban environmet as I said, because I didnt implement and A3C. But I will post some results when I finished my master thesis.

@mpSchrader
Copy link
Owner

Awesome! I am looking forward to reading your thesis.

@wrongbattery
Copy link

actually, i am trying to implement I2A with your env. However, your env fails many times "Runtime Error/Warning: Generated Model with score == 0. Retry". So have you calculated successive generating rate for your env yet?

@mpSchrader
Copy link
Owner

mpSchrader commented Nov 21, 2018 via email

@Olloxan
Copy link
Contributor

Olloxan commented Feb 26, 2019

Awesome! I am looking forward to reading your thesis.

Hi,
I wrote you an email, just so you are not wondering, where it might came from^^
Best regards

@yangzhao-666
Copy link

Awesome! I am looking forward to reading your thesis.

Hi,
I wrote you an email, just so you are not wondering, where it might came from^^
Best regards

Thanks for @mpSchrader amazing works.

Also really appreciate for @Olloxan 's hints. It helped me a lot to understand. I'm also trying to implement I2A on Sokoban as the start of my PhD works. I was wondering that did you get any results? Does it show the similar results shown in the original paper?

Looking forward to your reply and have a nice day.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants