Question about GAE normalization strategy in PPO implementation #804

songyuc · 2025-01-24T06:10:34Z

Hello,

I am a student currently learning about reinforcement learning and I came across your PPO implementation in the ManiSkill repository. I noticed a particular comment in the code that mentions normalizing the GAE by the sum of lambda^i instead of the standard 1-lambda method.

Here is the relevant code-snippet.
I am curious about the reasoning behind this approach. Is there a specific advantage or scenario in which this method is preferred? I would greatly appreciate any insights or references to related literature.

Thank you for your time and for providing such a comprehensive resource!

Best regards,
Yucheng Song

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about GAE normalization strategy in PPO implementation #804

Question about GAE normalization strategy in PPO implementation #804

songyuc commented Jan 24, 2025

Question about GAE normalization strategy in PPO implementation #804

Question about GAE normalization strategy in PPO implementation #804

Comments

songyuc commented Jan 24, 2025