You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/homepage/blog/ospp_report_210370190/index.md
+5-4Lines changed: 5 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -301,7 +301,7 @@ As for updating the policy, the process is mainly the same as the [`DDPGPolicy`]
301
301
302
302
#### Usage
303
303
304
-
Here `MADDPGManager` is used for simultaneous games, or you can add an [action-related wrapper](https://juliareinforcementlearning.org/docs/rlenvs/#ReinforcementLearningEnvironments.ActionTransformedEnv-Tuple{Any}) to the sequential game to drop the dummy action of other players. And there is one [experiment](https://juliareinforcementlearning.org/docs/experiments/experiments/Policy%20Gradient/JuliaRL_MADDPG_KuhnPoker/#JuliaRL\\_MADDPG\\_KuhnPoker)`JuliaRL_MADDPG_KuhnPoker` as one usage example, which tests the algorithm on the Kuhn Poker game. Since the Kuhn Poker is one sequential game, I wrap the game just like the following:
304
+
Here `MADDPGManager` is used for the environments of [`SIMULTANEOUS`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.SIMULTANEOUS) and continuous action space(see the blog [Diagonal Gaussian Policies](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#stochastic-policies)), or you can add an [action-related wrapper](https://juliareinforcementlearning.org/docs/rlenvs/#ReinforcementLearningEnvironments.ActionTransformedEnv-Tuple{Any}) to the environment to ensure it can work with the algorithm. There is one [experiment](https://juliareinforcementlearning.org/docs/experiments/experiments/Policy%20Gradient/JuliaRL_MADDPG_KuhnPoker/#JuliaRL\\_MADDPG\\_KuhnPoker) `JuliaRL_MADDPG_KuhnPoker` as one usage example, which tests the algorithm on the Kuhn Poker game. Since the Kuhn Poker is one [`SEQUENTIAL`](ReinforcementLearningBase.SEQUENTIAL) game with discrete action space(see also the blog [Diagonal Gaussian Policies](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#stochastic-policies)), I wrap the environment just like the following:
state_space_mapping = ss -> [[findfirst(==(s), state_space(env))] for s instate_space(env)]
311
311
),
312
312
## drop the dummy action of the other agent.
313
-
action_mapping = x ->length(x) ==1? x :Int(x[current_player(env)] +1),
313
+
action_mapping = x ->length(x) ==1? x :Int(ceil(x[current_player(env)])+1),
314
314
)
315
315
```
316
316
@@ -376,9 +376,10 @@ agents = MADDPGManager(
376
376
policy =NamedPolicy(player, deepcopy(policy)),
377
377
trajectory =deepcopy(trajectory),
378
378
)) for player inplayers(env) if player !=chance_player(env)),
379
+
SARTS, # traces
379
380
128, # batch_size
380
381
128, # update_freq
381
-
0, # update_step
382
+
0, #initial update_step
382
383
rng
383
384
)
384
385
```
@@ -387,4 +388,4 @@ Plus on the [`stop_condition`](https://github.com/JuliaReinforcementLearning/Rei
387
388
388
389
\dfig{body;JuliaRL_MADDPG_KuhnPoker.png;Result of the experiment.}
389
390
390
-
**Note that**the current `MADDPGManager` still only works on the envs of [`MINIMAL_ACTION_SET`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.MINIMAL_ACTION_SET). And since **MADDPG** is one deterministic algorithm, i.e., the state's response is one deterministic action, the Kuhn Poker game may not be suitable for testing the performance. In the next weeks, I'll update the algorithm and try to test it on other games.
391
+
**Note that** since **MADDPG** is one deterministic algorithm, i.e., the state's response is one deterministic action, the Kuhn Poker game may not be suitable for testing the performance. In the next weeks, I'll update the algorithm and try to test it on other games.
Copy file name to clipboardExpand all lines: src/ReinforcementLearningZoo/src/algorithms/policy_gradient/maddpg.jl
+42-9Lines changed: 42 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -6,26 +6,28 @@ Multi-agent Deep Deterministic Policy Gradient(MADDPG) implemented in Julia. Her
6
6
See the paper https://arxiv.org/abs/1706.02275 for more details.
7
7
8
8
# Keyword arguments
9
-
- `agents::Dict{<:Any, <:NamedPolicy{<:Agent{<:DDPGPolicy, <:AbstractTrajectory}, <:Any}}`, here each agent collects its own information. While updating the policy, each `critic` will assemble all agents' trajectory to update its own network.
9
+
- `agents::Dict{<:Any, <:NamedPolicy{<:Agent{<:DDPGPolicy, <:AbstractTrajectory}, <:Any}}`, here each agent collects its own information. While updating the policy, each **critic** will assemble all agents' trajectory to update its own network.
10
+
- `traces`, set to `SARTS` if you are apply to an environment of `MINIMAL_ACTION_SET`, or `SLARTSL` if you are to apply to an environment of `FULL_ACTION_SET`.
0 commit comments