Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ddqn_agent.py to prevent RuntimeError with newer pytorch version #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

atlevesque
Copy link

When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError:

RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57)
(no backtrace available)'

My guess is that there is a diamond shape dependency when running the backward method as the self.q_eval network parameters affect the loss via q_pred and q_eval.

I fixed the issue by explicitly detaching the max_actions tensor from the computational tree as it is a discrete value and small changes in the self.q_eval network parameters should not change the max_actions taken. The derivative of the loss with respect to the self.q_eval network parameters thus only comes from the q_pred calculation.

I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError.

When running the ddqn agent on pytorch v 1.5.0 I get the following RuntimeError:

RuntimeError: range.second - range.first == t.size() INTERNAL ASSERT FAILED at ..\torch\csrc\autograd\generated\Functions.cpp:57, please report a bug to PyTorch. inconsistent range for TensorList output (copy_range at ..\torch\csrc\autograd\generated\Functions.cpp:57)
(no backtrace available)'

My guess is that there is a diamond shape dependency when running the backward method as the `self.q_eval` network parameters affect the loss via `q_pred` and `q_eval`. 

I fixed the issue by explicitly detaching the `max_actions` tensor from the computational tree as it is a discrete value and small changes in the `self.q_eval` network parameters should not change the max_actions taken. The derivative of the loss with respect to the `self.q_eval` network parameters thus only comes from the q_pred calculation.

I tested this change on my computer and got good performance and (more improtantly) didn't get the RuntimeError.
@atlevesque
Copy link
Author

Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞

DDQNAgent_PongNoFrameskip-v4_lr0 0001_500games

@srikanthkb
Copy link

Here are the results I got when running the same Pong test case as you did in the course. it is marginally better than my run with the DQN algorythm and slightly worse than the score you had in your demo as I had to dramattically decrease the ReplayMemory size to fit in my old 6Gb RAM PC😞

DDQNAgent_PongNoFrameskip-v4_lr0 0001_500games

Hi,
Did you make any other changes before running the main_ddqn.py ?
When i tried to run it, the agent is not learning and the average scores are around -17.0, can you let me know how were you able to obtain appropriate results ?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants