Skip to content

Conversation

@MeFredFeng
Copy link

This pull request introduces several key updates to the reinforcement learning solvers and project environment setup. The most significant changes include the implementation of core algorithms for Monte Carlo, Policy Iteration, and Value Iteration solvers, as well as the addition of a new Conda environment configuration for macOS. These updates improve the functionality and usability of the codebase, making the solvers ready for experimentation and development.

Algorithm Implementations

  • Added the core Monte Carlo algorithm, including episode generation, Q-value updates, and policy functions for both on-policy and off-policy learning in Monte_Carlo.py. This includes epsilon-soft and greedy policy implementations. [1] [2] [3] [4]
  • Implemented Policy Iteration logic: updated policy improvement with one-step lookahead and policy evaluation using matrix methods in Policy_Iteration.py. [1] [2]
  • Completed Value Iteration updates: added one-step lookahead for value updates and policy extraction, and improved prioritized sweeping logic in Value_Iteration.py. [1] [2] [3]

Environment and Project Setup

  • Added a new Conda environment file environment_mac_mod.yml to facilitate reproducible setup on macOS, including all necessary Python dependencies for running the solvers.
  • Updated .idea/.gitignore to ignore IDE-specific files and folders, improving repository cleanliness for development.

Add environment configuration for macOS and IntelliJ IDEA gitignore
Implement value updates and action selection in Value Iteration
Implement policy evaluation and update in Policy Iteration; optimize value updates in Value Iteration
1. Implement Monte Carlo and Off-Policy Monte Carlo methods in Monte_Carlo.py;
2. Refine policy functions in Policy_Iteration.py and Value_Iteration.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant