You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/homepage/blog/index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,14 +2,14 @@
2
2
@def description = ""
3
3
@def is_enable_toc = false
4
4
5
-
-[Implement Multi-Agent Reinforcement Learning Algorithms in Julia (Summer OSPP Project 210370190) Mid-term Report](/blog/ospp_mid-term_report_210370190)
6
-
7
5
-[An Introduction to ReinforcementLearning.jl: Design, Implementations and Thoughts](/blog/an_introduction_to_reinforcement_learning_jl_design_implementations_thoughts)
8
6
9
7
-[Phase 1 Technical Report of Enriching Offline Reinforcement Learning Algorithms in ReinforcementLearning.jl](/blog/offline_reinforcement_learning_algorithm_phase1)
10
8
11
9
-[Establish a General Pipeline for Offline Reinforcement Learning Evaluation (Summer OSPP Project 210370741) Mid-term Report](/blog/ospp_mid-term_report_210370741)
12
10
11
+
-[Implement Multi-Agent Reinforcement Learning Algorithms in Julia (Summer OSPP Project 210370190) Report](/blog/ospp_report_210370190)
12
+
13
13
- Notebooks for the book: [*Reinforcement Learning: an Introduction 2nd
Copy file name to clipboardExpand all lines: docs/homepage/blog/ospp_report_210370190/index.md
+9-25Lines changed: 9 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
@def title = "Implement Multi-Agent Reinforcement Learning Algorithms in Julia"
2
2
@def description = """
3
-
This is a technical report of the summer OSPP project [Implement Multi-Agent Reinforcement Learning Algorithms in Julia](https://summer.iscas.ac.cn/#/org/prodetail/210370190?lang=en). In this report, the following three parts are covered: the first section is a basic introduction to the project, the second section contains the implementation details of several multi-agent algorithms, and in the last section, we discussed our future plan.
3
+
This is a technical report of the summer OSPP project [Implement Multi-Agent Reinforcement Learning Algorithms in Julia](https://summer.iscas.ac.cn/#/org/prodetail/210370190?lang=en). In this report, the following two parts are covered: the first section is a basic introduction to the project, and the second section contains the implementation details of several multi-agent algorithms.
4
4
"""
5
5
@def is_enable_toc = true
6
6
@def has_code = true
@@ -16,7 +16,7 @@
16
16
"affiliationURL":"http://english.ecnu.edu.cn/"
17
17
}
18
18
],
19
-
"publishedDate":"2021-08-16",
19
+
"publishedDate":"2021-08-17",
20
20
"citationText":"Peter Chen, 2021"
21
21
}"""
22
22
@@ -34,24 +34,23 @@ Recent advances in reinforcement learning led to many breakthroughs in artificia
34
34
| 07/15 -- 07/29 | Add **NFSP** algorithm into [ReinforcementLearningZoo.jl](https://juliareinforcementlearning.org/docs/rlzoo/), and test it on the [`KuhnPokerEnv`](https://juliareinforcementlearning.org/docs/rlenvs/#ReinforcementLearningEnvironments.KuhnPokerEnv). |
35
35
| 07/30 -- 08/07 | Fix the existing bugs of **NFSP** and implement the **MADDPG** algorithm into ReinforcementLearningZoo.jl. |
36
36
| 08/08 -- 08/15 | Update the **MADDPG** algorithm and test it on the `KuhnPokerEnv`, also complete the **mid-term report**. |
37
-
| 08/16 -- 08/30 | Test **MADDPG** algorithm on more envs and consider implementing the **ED**\dcite{DBLP:journals/corr/abs-1903-05614} algorithm into ReinforcementLearningZoo.jl. |
38
-
| 08/31 -- 09/07 | Complete the **ED** implementation, and add relative experiments. |
| 09/15 -- 09/30 | Complete **PSRO** implementation and add relative experiments, also complete the **final-term report**. |
37
+
| 08/16 -- 08/23 | Add support for environments of [`FULL_ACTION_SET`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.FULL_ACTION_SET) in **MADDPG** and test it on more games, such as [`simple_adversary`](https://github.com/openai/multiagent-particle-envs/blob/master/multiagent/scenarios/simple_adversary.py). |
38
+
| 08/24 -- 08/30 | ... |
41
39
42
40
### Accomplished Work
43
41
44
-
From July 1st to now, I mainly have implemented the **Neural Fictitious Self-play(NFSP)** algorithm and added it into [ReinforcementLearningZoo.jl](https://juliareinforcementlearning.org/docs/rlzoo/). A workable [experiment](https://juliareinforcementlearning.org/docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker/#JuliaRL\\_NFSP\\_KuhnPoker) is also added to the documentation. Besides, the **Multi-agent Deep Deterministic Policy Gradient(MADDPG)** algorithm's semi-finished implementation has been placed into ReinforcementLearningZoo.jl, and I will test it on more envs in the next weeks. Related commits are listed below:
42
+
From July 1st to now, I have implemented the **Neural Fictitious Self-play(NFSP)** algorithm and added it into [ReinforcementLearningZoo.jl](https://juliareinforcementlearning.org/docs/rlzoo/). A workable [experiment](https://juliareinforcementlearning.org/docs/experiments/experiments/NFSP/JuliaRL_NFSP_KuhnPoker/#JuliaRL\\_NFSP\\_KuhnPoker) is also added to the documentation. Besides, the **Multi-agent Deep Deterministic Policy Gradient(MADDPG)** algorithm's semi-finished implementation has been placed into ReinforcementLearningZoo.jl, and I will test it on more envs in the next weeks. Related commits are listed below:
45
43
46
44
-[add Base.:(==) and Base.hash for AbstractEnv and test nash_conv on KuhnPokerEnv#348](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/348)
47
45
-[Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/390)
48
46
-[Implementation of NFSP and NFSP_KuhnPoker experiment #402](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/402)
In this section, I will first briefly review the [`Agent`](https://juliareinforcementlearning.org/docs/rlcore/#ReinforcementLearningCore.Agent) structure defined in [ReinforcementLearningCore.jl](https://juliareinforcementlearning.org/docs/rlcore/). Then I'll explain how **NFSP** and **MADDPG** are implemented, followed by a short example to demonstrate how others can use them in their customized environments.
53
+
In this section, I will first briefly review the [`Agent`](https://juliareinforcementlearning.org/docs/rlcore/#ReinforcementLearningCore.Agent) structure defined in [ReinforcementLearningCore.jl](https://juliareinforcementlearning.org/docs/rlcore/). Then I'll explain how these multi-agent algorithms(**NFSP**, **MADDPG**, ...) are implemented, followed by a short example to demonstrate how others can use them in their customized environments.
55
54
56
55
### 2.1 An Introduction to `Agent`
57
56
@@ -141,7 +140,7 @@ end
141
140
142
141
- PostActStage
143
142
144
-
After executing the action, the `NFSPAgent` needs to add the personal **reward** and the **is_terminated**result of the current state into the **RL** agent's trajectory.
143
+
After executing the action, the `NFSPAgent` needs to add the personal **reward** and the **is_terminated**results of the current state into the **RL** agent's trajectory.
145
144
```Julia
146
145
function (π::NFSPAgent)(::PostActStage, env::AbstractEnv, player::Any)
When one episode is terminated, the agent should push the **terminated state** and a **dummy action** (see also the [note](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/4e5d258798088b1c628401b6b9de18aa8cbb3ab3/src/ReinforcementLearningCore/src/policies/agents/agent.jl#L134)) into the **RL** agent's trajectory. Also, the **reward** and **is_terminated**result need to be corrected to avoid getting wrong samples when playing the [`SEQUENTIAL`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.SEQUENTIAL) or [`TERMINAL_REWARD`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.TERMINAL_REWARD) games.
153
+
When one episode is terminated, the agent should push the **terminated state** and a **dummy action** (see also the [note](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/4e5d258798088b1c628401b6b9de18aa8cbb3ab3/src/ReinforcementLearningCore/src/policies/agents/agent.jl#L134)) into the **RL** agent's trajectory. Also, the **reward** and **is_terminated**results need to be corrected to avoid getting the wrong samples when playing the games of [`SEQUENTIAL`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.SEQUENTIAL) or [`TERMINAL_REWARD`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.TERMINAL_REWARD).
155
154
```Julia
156
155
function (π::NFSPAgent)(::PostEpisodeStage, env::AbstractEnv, player::Any)
157
156
rl = π.rl_agent
@@ -389,18 +388,3 @@ Plus on the [`stop_condition`](https://github.com/JuliaReinforcementLearning/Rei
389
388
\dfig{body;JuliaRL_MADDPG_KuhnPoker.png;Result of the experiment.}
390
389
391
390
**Note that** the current `MADDPGManager` still only works on the envs of [`MINIMAL_ACTION_SET`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.MINIMAL_ACTION_SET). And since **MADDPG** is one deterministic algorithm, i.e., the state's response is one deterministic action, the Kuhn Poker game may not be suitable for testing the performance. In the next weeks, I'll update the algorithm and try to test it on other games.
392
-
393
-
## 3. Reviews and Future Plan
394
-
395
-
### 3.1 Reviews
396
-
397
-
From applying the project to now, since spending much time on getting familiar with the algorithm and structure of RL.jl, my progress was slow in the initial weeks. However, thanks to the mentor's patience in leading, I realize the convenience of the general workflow in RL.jl and improve my comprehension of the algorithms.
398
-
399
-
### 3.2 Future Plan
400
-
401
-
In the first section, I have listed a rough plan for the next serval weeks. In detail, I want to complete the following missions:
402
-
403
-
- Test **MADDPG** on more suitable envs and add relative experiments. (08/16 - 08/23)
404
-
- Consider implementing the **Exploitability Descent(ED)**\dcite{DBLP:journals/corr/abs-1903-05614} algorithm and add related experiments. (08/24 - 09/07)
405
-
- Consider implementing the **Policy-Spaced Response Oracles(PSRO)**\dcite{DBLP:journals/corr/abs-1909-12823} algorithm and add related experiments. (09/08 - 09/22)
406
-
- Fix the existing bugs of algorithms and finish the **final-term report**. (09/23 - 09/30)
0 commit comments