-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Success Rate? #11
Comments
Hey, Currently I do not have any reliable success rates my self. I would recommend reading Deep Minds paper about Imagination Augmented Agents. In there they represanted results and compared them to a baseline RL algorithm. The I2A architecture solved over 80% percent. In a very computational expansive configuration they were able to solve over 90%. During a class project we implemented the architecture, but we were not able to replicate the results due to limited computational power. |
This weekend I will have a look the openAIs baselines repo and check on how to make gym-sokoban usable in with the repo. After that I will upload some baselines in the documentation. |
Hey, I am currently trying to implement your sokoban environment for the I2A Agent for my Master Thesis. I am using this repo as a starting point. So far parallelizing of sokoban environments seems to work. If you need any hints for your baseline implementation I suggest having a look into the parallel implementation if the pacman environment in the I2A implenentation |
@Olloxan Thanks for the hint. I was planning to first run the simple subpackages from https://github.com/openai/baselines#subpackages-1 as a very first baseline and later on try more advanced techniques. ;-) Maybe you could add your results of your I2A implementation. By the way it would be great if you could share the results of your thesis in the end. ;-) |
i found that env.reset() takes at least 14 seconds to create new game, if we play 100k game, it takes at least 400h for training |
Hi @wrongbattery,
This is currently due to the Level Generation, which generates always solvable environment. The generation algorithm uses a Depth-First-Search to generate the room, by reverse playing the initial room. This algorithm is based on the DeepMind paper linked in the readme file.
If you have an idea how to improve the algorithm, please let me know and I will implement it. ;-)
Best,
Max
|
Hi, The problem in general is the lack of computing power. In order for this environment to be learned by a model-free actor-critic network or even the I2A you need at least 32 parallel agents/environments otherwise your network will just overfit as a result of very sparse rewards. For the reasons I stated above, I could not use this sokoban environment for my master thesis as it was just too computationally expensive. I developed a different environment that is computationally very light weight and still offers sparse rewards. But just for comparisen: I train the I2A agent with 1e6 training epochs, which takes around 20 days. I hope, the network converges faster so that i can stop the process earlier. In the paper you can see a significant increase of the learning curve after about 5e8 epochs. They train their network for 1e9 epochs. That is not possible unless you have a datacenter. So my tip if you want to try the I2A approach with sokoban: build a working A3C, and train it on a system that has at least 28 to 32 cores, otherwise your netowork will overfit. You need asynchronuos level generation otherwise the training takes several months. And get a snickers because the training will still take very long... |
Hi @Olloxan, thanks for your insights. Regarding the pixel size did you scaled it down or did you used the tiny_world rendering modes? Do you have some results to share? If so you could add subpage with the current high score. ;-) This spring I had the chance to attend a lecture by a Deep Mind employee. After the lecture we talked with him about the implementation and the training process of I2A. During the conversation the guest lecturer, who was not part of the I2A team, said that we as students probably won't have the computing resources to train that architecture in a responsable time. Just fyi ;-) |
In order to Use the proposed network structure and make use of the kernels I had to use the pixel version. I just scaled your 16x16 images down to 8x8, that was no problem. I use a 16 core and a 28 core system. That is not bad for the beginning. And I got pretty good results on the maodel-free actor critics. Unfortunately I could not use your sokoban environmet as I said, because I didnt implement and A3C. But I will post some results when I finished my master thesis. |
Awesome! I am looking forward to reading your thesis. |
actually, i am trying to implement I2A with your env. However, your env fails many times "Runtime Error/Warning: Generated Model with score == 0. Retry". So have you calculated successive generating rate for your env yet? |
Hey,
Thanks for that input. Could you open a new ticket for the issue of failing room generation?
I already got an idea on how to fix this issue.
By the way which environments are you using?
Best,
Max
…________________________________
From: wrongbattery <[email protected]>
Sent: Wednesday, November 21, 2018 7:17:28 AM
To: mpSchrader/gym-sokoban
Cc: Max Schrader; Comment
Subject: Re: [mpSchrader/gym-sokoban] Success Rate? (#11)
actually, i am trying to implement I2A with your env. However, your env fails many times "Runtime Error/Warning: Generated Model with score == 0. Retry". So have you calculated successive generating rate for your env yet?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#11 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AI9THm_8Hj4vYSWkx_5DdUG0c0q89MTVks5uxO_4gaJpZM4VvRFQ>.
|
Hi, |
Thanks for @mpSchrader amazing works. Also really appreciate for @Olloxan 's hints. It helped me a lot to understand. I'm also trying to implement I2A on Sokoban as the start of my PhD works. I was wondering that did you get any results? Does it show the similar results shown in the original paper? Looking forward to your reply and have a nice day. Best regards |
Hello
I am amazed by your work. I am wondering if you tested the Sokoban's game on the standard RL method (Q learning, A2C, ec), and wondering if you have success rate for this kind of game?
The text was updated successfully, but these errors were encountered: