You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a student currently learning about reinforcement learning and I came across your PPO implementation in the ManiSkill repository. I noticed a particular comment in the code that mentions normalizing the GAE by the sum of lambda^i instead of the standard 1-lambda method.
Here is the relevant code-snippet.
I am curious about the reasoning behind this approach. Is there a specific advantage or scenario in which this method is preferred? I would greatly appreciate any insights or references to related literature.
Thank you for your time and for providing such a comprehensive resource!
Best regards,
Yucheng Song
The text was updated successfully, but these errors were encountered:
Hello,
I am a student currently learning about reinforcement learning and I came across your PPO implementation in the ManiSkill repository. I noticed a particular comment in the code that mentions normalizing the GAE by the sum of
lambda^i
instead of the standard 1-lambda method.Here is the relevant code-snippet.
I am curious about the reasoning behind this approach. Is there a specific advantage or scenario in which this method is preferred? I would greatly appreciate any insights or references to related literature.
Thank you for your time and for providing such a comprehensive resource!
Best regards,
Yucheng Song
The text was updated successfully, but these errors were encountered: