Skip to content

Commit 6766c8f

Browse files
update the plan of week 7 (#468)
Co-authored-by: Jun Tian <[email protected]>
1 parent 31d133e commit 6766c8f

File tree

7 files changed

+11
-27
lines changed

7 files changed

+11
-27
lines changed

docs/homepage/blog/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
@def description = ""
33
@def is_enable_toc = false
44

5-
- [Implement Multi-Agent Reinforcement Learning Algorithms in Julia (Summer OSPP Project 210370190) Mid-term Report](/blog/ospp_mid-term_report_210370190)
6-
75
- [An Introduction to ReinforcementLearning.jl: Design, Implementations and Thoughts](/blog/an_introduction_to_reinforcement_learning_jl_design_implementations_thoughts)
86

97
- [Phase 1 Technical Report of Enriching Offline Reinforcement Learning Algorithms in ReinforcementLearning.jl](/blog/offline_reinforcement_learning_algorithm_phase1)
108

119
- [Establish a General Pipeline for Offline Reinforcement Learning Evaluation (Summer OSPP Project 210370741) Mid-term Report](/blog/ospp_mid-term_report_210370741)
1210

11+
- [Implement Multi-Agent Reinforcement Learning Algorithms in Julia (Summer OSPP Project 210370190) Report](/blog/ospp_report_210370190)
12+
1313
- Notebooks for the book: [*Reinforcement Learning: an Introduction 2nd
1414
Edition*](https://github.com/JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl)
1515

docs/homepage/blog/ospp_mid-term_report_210370190/index.md renamed to docs/homepage/blog/ospp_report_210370190/index.md

Lines changed: 9 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
@def title = "Implement Multi-Agent Reinforcement Learning Algorithms in Julia"
22
@def description = """
3-
This is a technical report of the summer OSPP project [Implement Multi-Agent Reinforcement Learning Algorithms in Julia](https://summer.iscas.ac.cn/#/org/prodetail/210370190?lang=en). In this report, the following three parts are covered: the first section is a basic introduction to the project, the second section contains the implementation details of several multi-agent algorithms, and in the last section, we discussed our future plan.
3+
This is a technical report of the summer OSPP project [Implement Multi-Agent Reinforcement Learning Algorithms in Julia](https://summer.iscas.ac.cn/#/org/prodetail/210370190?lang=en). In this report, the following two parts are covered: the first section is a basic introduction to the project, and the second section contains the implementation details of several multi-agent algorithms.
44
"""
55
@def is_enable_toc = true
66
@def has_code = true
@@ -16,7 +16,7 @@
1616
"affiliationURL":"http://english.ecnu.edu.cn/"
1717
}
1818
],
19-
"publishedDate":"2021-08-16",
19+
"publishedDate":"2021-08-17",
2020
"citationText":"Peter Chen, 2021"
2121
}"""
2222

@@ -34,24 +34,23 @@ Recent advances in reinforcement learning led to many breakthroughs in artificia
3434
| 07/15 -- 07/29 | Add **NFSP** algorithm into [ReinforcementLearningZoo.jl](https://juliareinforcementlearning.org/docs/rlzoo/), and test it on the [`KuhnPokerEnv`](https://juliareinforcementlearning.org/docs/rlenvs/#ReinforcementLearningEnvironments.KuhnPokerEnv). |
3535
| 07/30 -- 08/07 | Fix the existing bugs of **NFSP** and implement the **MADDPG** algorithm into ReinforcementLearningZoo.jl. |
3636
| 08/08 -- 08/15 | Update the **MADDPG** algorithm and test it on the `KuhnPokerEnv`, also complete the **mid-term report**. |
37-
| 08/16 -- 08/30 | Test **MADDPG** algorithm on more envs and consider implementing the **ED**\dcite{DBLP:journals/corr/abs-1903-05614} algorithm into ReinforcementLearningZoo.jl. |
38-
| 08/31 -- 09/07 | Complete the **ED** implementation, and add relative experiments. |
39-
| 09/08 -- 09/14 | Consider implementing **PSRO** algorithm into ReinforcementLearningZoo.jl. |
40-
| 09/15 -- 09/30 | Complete **PSRO** implementation and add relative experiments, also complete the **final-term report**. |
37+
| 08/16 -- 08/23 | Add support for environments of [`FULL_ACTION_SET`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.FULL_ACTION_SET) in **MADDPG** and test it on more games, such as [`simple_adversary`](https://github.com/openai/multiagent-particle-envs/blob/master/multiagent/scenarios/simple_adversary.py). |
38+
| 08/24 -- 08/30 | ... |
4139

4240
### Accomplished Work
4341

44-
From July 1st to now, I mainly have implemented the **Neural Fictitious Self-play(NFSP)** algorithm and added it into [ReinforcementLearningZoo.jl](https://juliareinforcementlearning.org/docs/rlzoo/). A workable [experiment](https://juliareinforcementlearning.org/docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker/#JuliaRL\\_NFSP\\_KuhnPoker) is also added to the documentation. Besides, the **Multi-agent Deep Deterministic Policy Gradient(MADDPG)** algorithm's semi-finished implementation has been placed into ReinforcementLearningZoo.jl, and I will test it on more envs in the next weeks. Related commits are listed below:
42+
From July 1st to now, I have implemented the **Neural Fictitious Self-play(NFSP)** algorithm and added it into [ReinforcementLearningZoo.jl](https://juliareinforcementlearning.org/docs/rlzoo/). A workable [experiment](https://juliareinforcementlearning.org/docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker/#JuliaRL\\_NFSP\\_KuhnPoker) is also added to the documentation. Besides, the **Multi-agent Deep Deterministic Policy Gradient(MADDPG)** algorithm's semi-finished implementation has been placed into ReinforcementLearningZoo.jl, and I will test it on more envs in the next weeks. Related commits are listed below:
4543

4644
- [add Base.:(==) and Base.hash for AbstractEnv and test nash_conv on KuhnPokerEnv#348](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/348)
4745
- [Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/390)
4846
- [Implementation of NFSP and NFSP_KuhnPoker experiment #402](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/402)
4947
- [correct nfsp implementation #439](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/439)
5048
- [add MADDPG algorithm #444](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/444)
49+
- ...
5150

5251
## 2. Implementation and Usage
5352

54-
In this section, I will first briefly review the [`Agent`](https://juliareinforcementlearning.org/docs/rlcore/#ReinforcementLearningCore.Agent) structure defined in [ReinforcementLearningCore.jl](https://juliareinforcementlearning.org/docs/rlcore/). Then I'll explain how **NFSP** and **MADDPG** are implemented, followed by a short example to demonstrate how others can use them in their customized environments.
53+
In this section, I will first briefly review the [`Agent`](https://juliareinforcementlearning.org/docs/rlcore/#ReinforcementLearningCore.Agent) structure defined in [ReinforcementLearningCore.jl](https://juliareinforcementlearning.org/docs/rlcore/). Then I'll explain how these multi-agent algorithms(**NFSP**, **MADDPG**, ...) are implemented, followed by a short example to demonstrate how others can use them in their customized environments.
5554

5655
### 2.1 An Introduction to `Agent`
5756

@@ -141,7 +140,7 @@ end
141140

142141
- PostActStage
143142

144-
After executing the action, the `NFSPAgent` needs to add the personal **reward** and the **is_terminated** result of the current state into the **RL** agent's trajectory.
143+
After executing the action, the `NFSPAgent` needs to add the personal **reward** and the **is_terminated** results of the current state into the **RL** agent's trajectory.
145144
```Julia
146145
function::NFSPAgent)(::PostActStage, env::AbstractEnv, player::Any)
147146
push!.rl_agent.trajectory[:reward], reward(env, player))
@@ -151,7 +150,7 @@ end
151150

152151
- PostEpisodeStage
153152

154-
When one episode is terminated, the agent should push the **terminated state** and a **dummy action** (see also the [note](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/4e5d258798088b1c628401b6b9de18aa8cbb3ab3/src/ReinforcementLearningCore/src/policies/agents/agent.jl#L134)) into the **RL** agent's trajectory. Also, the **reward** and **is_terminated** result need to be corrected to avoid getting wrong samples when playing the [`SEQUENTIAL`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.SEQUENTIAL) or [`TERMINAL_REWARD`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.TERMINAL_REWARD) games.
153+
When one episode is terminated, the agent should push the **terminated state** and a **dummy action** (see also the [note](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/4e5d258798088b1c628401b6b9de18aa8cbb3ab3/src/ReinforcementLearningCore/src/policies/agents/agent.jl#L134)) into the **RL** agent's trajectory. Also, the **reward** and **is_terminated** results need to be corrected to avoid getting the wrong samples when playing the games of [`SEQUENTIAL`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.SEQUENTIAL) or [`TERMINAL_REWARD`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.TERMINAL_REWARD).
155154
```Julia
156155
function::NFSPAgent)(::PostEpisodeStage, env::AbstractEnv, player::Any)
157156
rl = π.rl_agent
@@ -389,18 +388,3 @@ Plus on the [`stop_condition`](https://github.com/JuliaReinforcementLearning/Rei
389388
\dfig{body;JuliaRL_MADDPG_KuhnPoker.png;Result of the experiment.}
390389

391390
**Note that** the current `MADDPGManager` still only works on the envs of [`MINIMAL_ACTION_SET`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.MINIMAL_ACTION_SET). And since **MADDPG** is one deterministic algorithm, i.e., the state's response is one deterministic action, the Kuhn Poker game may not be suitable for testing the performance. In the next weeks, I'll update the algorithm and try to test it on other games.
392-
393-
## 3. Reviews and Future Plan
394-
395-
### 3.1 Reviews
396-
397-
From applying the project to now, since spending much time on getting familiar with the algorithm and structure of RL.jl, my progress was slow in the initial weeks. However, thanks to the mentor's patience in leading, I realize the convenience of the general workflow in RL.jl and improve my comprehension of the algorithms.
398-
399-
### 3.2 Future Plan
400-
401-
In the first section, I have listed a rough plan for the next serval weeks. In detail, I want to complete the following missions:
402-
403-
- Test **MADDPG** on more suitable envs and add relative experiments. (08/16 - 08/23)
404-
- Consider implementing the **Exploitability Descent(ED)**\dcite{DBLP:journals/corr/abs-1903-05614} algorithm and add related experiments. (08/24 - 09/07)
405-
- Consider implementing the **Policy-Spaced Response Oracles(PSRO)**\dcite{DBLP:journals/corr/abs-1909-12823} algorithm and add related experiments. (09/08 - 09/22)
406-
- Fix the existing bugs of algorithms and finish the **final-term report**. (09/23 - 09/30)

0 commit comments

Comments
 (0)