|
| 1 | +# PASSIVE-ADP-AGENT |
| 2 | + |
| 3 | +## AIMA3e |
| 4 | +__function__ Passive-ADP-Agent(_percept_) __returns__ and action |
| 5 | + __inputs__: _percept_, a percept indication the current state _s'_ and reward signal _r'_ |
| 6 | + __persistent__: _π_, a fixed policy |
| 7 | +       _mdp_, an MDP with model _P_, rewards _R_, discount γ |
| 8 | +       _U_, a table of utilities, initially empty |
| 9 | +       _N<sub>sa</sub>_, a table of frequencies for state-action pairs, initially zero |
| 10 | +       _N<sub>s'|sa</sub>_, a table of outcome frequencies given state-action pairs, initially zero |
| 11 | +       _s_, _a_, the previous state and action, initially null |
| 12 | + __if__ _s'_ is new __then__ _U_[_s'_] ← _r'_; _R_[_s'_] ← _r'_ |
| 13 | + __if__ _s_ is not null __then__ |
| 14 | +   increment _N<sub>sa</sub>_[_s_, _a_] and _N<sub>s'|sa</sub>_[_s'_, _s_, _a_] |
| 15 | +   __for each__ _t_ such that _N<sub>s'|sa</sub>_[_t_, _s_, _a_] is nonzero __do__ |
| 16 | +     _P_(_t_ | _s_, _a_) ← _N<sub>s'|sa</sub>_[_t_, _s_, _a_] / _N<sub>sa</sub>_[_s_, _a_] |
| 17 | + _U_ ← Policy-Evaluation(_π_, _U_, _mdp_) |
| 18 | + __if__ _s'_.Terminal? __then__ _s_, _a_ ← null __else__ _s_, _a_ ← _s'_, _π_[_s'_] |
| 19 | + |
| 20 | +--- |
| 21 | +__Figure ??__ A passive reinforcement learning agent based on adaptive dynamic programming. The Policy-Evaluation function solves the fixed-policy Bellman equations, as described on page ??. |
0 commit comments