Skip to content

Commit cfc7594

Browse files
committed
doc : move and update fddp algo docpage
1 parent 6ce19a2 commit cfc7594

File tree

4 files changed

+47
-39
lines changed

4 files changed

+47
-39
lines changed

doc/Doxyfile.extra.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ INCLUDE_PATH = @PROJECT_SOURCE_DIR@/include
3232
EXCLUDE_SYMLINKS = YES
3333

3434
EXAMPLE_PATH = @PROJECT_SOURCE_DIR@/examples \
35-
@PROJECT_SOURCE_DIR@/doc/fddp
35+
@PROJECT_SOURCE_DIR@/doc
3636

3737
EXTRA_PACKAGES = {bm,stmaryrd}
3838
FORMULA_MACROFILE = @PROJECT_SOURCE_DIR@/doc/macros.inc

doc/fddp/fddp.html renamed to doc/fddp.html

Lines changed: 44 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ <h2 id="problem-definition">Problem definition</h2>
1515
<span class="math inline">\(\ell_T\)</span> is the terminal cost, <span
1616
class="math inline">\(f_0\)</span> is the initial state value, <span
1717
class="math inline">\(f\)</span> is the robot dynamics and <span
18-
class="math inline">\(T\)</span>, the time interval, is fixed.</p>
18+
class="math inline">\(\mathbb{T}\)</span>, the time interval, is
19+
fixed.</p>
1920
<p>The decision variables are <span
2021
class="math inline">\(\underline{x},\underline{u}\)</span>, both of
2122
infinite dimension. We approximate this problem using a discrete version
@@ -404,11 +405,12 @@ <h2 id="solving-the-ddp-forward-pass-with-a-partial-step">Solving the
404405
class="math display">\[\Delta x_0 = \alpha f_0 = \alpha \Delta
405406
x_0^*\]</span> Now, assuming the <span class="math inline">\(\Delta x_t
406407
= \alpha \Delta x_t^*\)</span>, we have: <span
407-
class="math display">\[\begin{aligned}
408+
class="math display">\[\begin{align*}
408409
\forall t=0..T\!\!-\!\!1,\quad\quad\quad \Delta u_t &amp;= -\alpha k_t
409410
- K_t \alpha \Delta x_t^* = \alpha \Delta u_t^*\\
410411
\Delta x_{t+1} &amp;= F_x \alpha \Delta x_t^* + F_u \alpha \Delta u_t +
411-
\alpha f_t = \alpha \Delta x_{t+1}\end{aligned}\]</span></p>
412+
\alpha f_t = \alpha \Delta x_{t+1}
413+
\end{align*}\]</span> <span class="math inline">\(\square\)</span></p>
412414
<h1 id="a-feasibility-prone-ocp-solver-using-ddp">A feasibility-prone
413415
OCP solver using DDP</h1>
414416
<p>So far we have detailed a method to solve a LQR program. Let’s now
@@ -607,11 +609,12 @@ <h2 id="expectation-model-in-underlinexunderlineu">Expectation model in
607609
sum at each shooting node of the cost gradient times the change in <span
608610
class="math inline">\(x\)</span> and <span
609611
class="math inline">\(u\)</span>: <span
610-
class="math display">\[\label{eq:d1_nogaps}
611-
\Delta_1 = \sum_{t=0}^T L_{xt}^T x_t + L_{ut}^T u_t\]</span> (to keep
612-
the sum simpler, we treat <span class="math inline">\(T\)</span>
613-
similarly to the other nodes, by introducing <span
614-
class="math inline">\(L_{uT} = 0\)</span>).</p>
612+
class="math display">\[\begin{equation}
613+
\label{eq:d1_nogaps}
614+
\Delta_1 = \sum_{t=0}^T L_{xt}^T x_t + L_{ut}^T u_t
615+
\end{equation}\]</span> (to keep the sum simpler, we treat <span
616+
class="math inline">\(T\)</span> similarly to the other nodes, by
617+
introducing <span class="math inline">\(L_{uT} = 0\)</span>).</p>
615618
<h3 id="linear-rollout">Linear rollout</h3>
616619
<p>The states and controls are obtained from a linear roll-out as: <span
617620
class="math display">\[x_{t+1} = F_{xt} x_t + F_{ut} u_t +
@@ -622,31 +625,34 @@ <h3 id="linear-rollout">Linear rollout</h3>
622625
class="math inline">\(F_{t} = F_{xt} + F_{ut} K_t\)</span> and <span
623626
class="math inline">\(c_{t+1} = F_{ut} k_{t} + f_{t+1}\)</span> (with
624627
<span class="math inline">\(c_0 = f_0\)</span>). And finally: <span
625-
class="math display">\[\begin{aligned}
628+
class="math display">\[\begin{align}
626629
x_t &amp;= F_{t-1} ... F_0 c_0 + F_{t-1} ... F_1 c_1 + ... + F_{t-1}
627630
c_{t-1} + c_t \\
628-
&amp;= \sum_{i=0}^t F_{t-1} ... F_i c_i
629-
\label{eq:lroll}\end{aligned}\]</span></p>
631+
&amp;= \sum_{i=0}^t F_{t-1} ... F_i c_i \label{eq:lroll}
632+
\end{align}\]</span></p>
630633
<h3 id="first-order-model-delta_1">First-order model <span
631634
class="math inline">\(\Delta_1\)</span></h3>
632635
<p>Replacing <span class="math inline">\(u_t\)</span> by <span
633636
class="math inline">\(k_t + K_t x_t\)</span>, the first-order term is:
634-
<span class="math display">\[\label{eq:d1}
637+
<span class="math display">\[\begin{equation}
638+
\label{eq:d1}
635639
\Delta_1 = \sum_{t=0}^T (L_{xt} + K_t^T L_{ut}) ^T x_t + \sum_{t=0}^T
636-
L_{ut}^T k_t\]</span> where we denote <span class="math inline">\(l_t =
637-
L_{xt} + K_t^T L_{ut}\)</span> to simplify the notation. Putting <a
640+
L_{ut}^T k_t
641+
\end{equation}\]</span> where we denote <span class="math inline">\(l_t
642+
= L_{xt} + K_t^T L_{ut}\)</span> to simplify the notation. Putting <a
638643
href="#eq:lroll" data-reference-type="eqref"
639644
data-reference="eq:lroll">[eq:lroll]</a> in <a href="#eq:d1"
640645
data-reference-type="eqref" data-reference="eq:d1">[eq:d1]</a>, we get:
641-
<span class="math display">\[\begin{aligned}
646+
<span class="math display">\[\begin{align}
642647
\Delta_1 &amp;= \sum_{t=0}^{T} l_t \sum_{i=0}^{t} F_{t-1} ... F_i c_i
643648
+ L_{ut}^T k_t \\
644649
&amp; = \sum_{i=0}^{T} c_i^T \sum_{t=i}^{T} F_t^T ... F_T^T l_t +
645-
k_i^T L_{ui}\end{aligned}\]</span> Each term of the sum is composed of a
646-
product of <span class="math inline">\(f_i\)</span> and a product of
647-
<span class="math inline">\(k_i\)</span>, and can then be evaluated from
648-
the result of the backward pass. Let’s exhibit these 2 terms. The term
649-
in <span class="math inline">\(f_i\)</span> is: <span
650+
k_i^T L_{ui}
651+
\end{align}\]</span> Each term of the sum is composed of a product of
652+
<span class="math inline">\(f_i\)</span> and a product of <span
653+
class="math inline">\(k_i\)</span>, and can then be evaluated from the
654+
result of the backward pass. Let’s exhibit these 2 terms. The term in
655+
<span class="math inline">\(f_i\)</span> is: <span
650656
class="math display">\[\Delta_{ft} = F_i^T ... F_T^T l_i = L_{xi} +
651657
F_{xi}^T \Delta_{fi+1} + K_i^T (L_{ui} + F_{ui} \Delta_{fi+1})\]</span>
652658
The term in <span class="math inline">\(k_i\)</span> is: <span
@@ -678,12 +684,13 @@ <h3 id="the-simple-case-where-t1">The simple case where <span
678684
the Value and Hamiltonian functions.</p>
679685
<p>In the case where we only consider one control <span
680686
class="math inline">\(u_0\)</span>, the expectation model is: <span
681-
class="math display">\[\begin{aligned}
687+
class="math display">\[\begin{align*}
682688
\Delta_1 &amp;= L_{x0}^T x_0 + L_{u0}^T u_0 + L_{x1}^T x1 \\
683689
&amp;= L_{0}^T f_0 + L_{u0} k_0 + L_{x1} F_{0} f_0 + L_{x1} F_{u0}
684690
k_0 + L_{x1} f_1 \\
685691
&amp;= (L_0 + F_0^T L_{x1})^T f_0 + (L_{u0} + F_{u0}^T L_{x1})^T k_0 +
686-
L_{x1} f_1\end{aligned}\]</span> We nearly recognize the gradients <span
692+
L_{x1} f_1
693+
\end{align*}\]</span> We nearly recognize the gradients <span
687694
class="math inline">\(V_{x0}, Q_{u0}, V_{x1}\)</span> respectively in
688695
factor of <span class="math inline">\(f_0,k_0,f_1\)</span>, but some
689696
terms are missing: <span class="math display">\[V_{x0} = L_0 + F_0^T
@@ -693,23 +700,24 @@ <h3 id="the-simple-case-where-t1">The simple case where <span
693700
f_1\]</span> Basically, the missing terms correspond to the
694701
re-linearization of the gradient at the <span
695702
class="math inline">\(f_t\)</span> points at the end of the intervals.
696-
Then, we get: <span class="math display">\[\begin{aligned}
703+
Then, we get: <span class="math display">\[\begin{align*}
697704
\Delta_1 &amp;= V_{x0}^T f_0 + Q_{u0}^T k_0 + V_{x1}^T f_1 - \left(
698705
f_0^T V_{xx0} f_0 + f_0^T F_0^T L_{xx1} f_1 + k_0^T F_{u0} L_{xx1} f_1
699706
+ f_1^T V_{xx1} f_1\right) \\
700707
&amp;= V_{x0}^T f_0 + Q_{u0}^T k_0 + V_{x1}^T f_1 - \left( f_0^T
701-
V_{xx0} x_0 + f_1^T V_{xx1} x_1 \right)\end{aligned}\]</span></p>
702-
<p>The second-order term is: <span
703-
class="math display">\[\begin{aligned}
708+
V_{xx0} x_0 + f_1^T V_{xx1} x_1 \right)
709+
\end{align*}\]</span></p>
710+
<p>The second-order term is: <span class="math display">\[\begin{align*}
704711
\Delta_2 &amp;= f_0^T V_{xx0} f_0 + k_0^T Q_{uu0} k_0 + f_1^T V_{xx1}
705712
f_1 + 2(f_0^T F_0^T L_{xx1} f_1 + k_0^T F_{u0} L_{xx1} f_1) \\
706713
&amp;= f_0^T V_{xx0} f_0 + k_0^T Q_{uu0} k_0 + f_1^T V_{xx1} f_1 +
707714
2\big(f_1^T V_{xx1} (x_1-f_1) \big) \\
708715
&amp;= -f_0^T V_{xx0} f_0 + k_0^T Q_{uu0} k_0 - f_1^T V_{xx1} f_1 +
709-
2\big(f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 \big)\end{aligned}\]</span>
710-
We can recognize in the additional terms (the 2 last ones) the same
711-
terms as in <span class="math inline">\(\Delta_1\)</span>. Nicely, they
712-
will cancel out in the case we make a full step <span
716+
2\big(f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 \big)
717+
\end{align*}\]</span> We can recognize in the additional terms (the 2
718+
last ones) the same terms as in <span
719+
class="math inline">\(\Delta_1\)</span>. Nicely, they will cancel out in
720+
the case we make a full step <span
713721
class="math inline">\(\alpha=1\)</span>: <span
714722
class="math display">\[\Delta(\alpha) = \alpha(
715723
\Delta_1+\frac{\alpha}{2} \Delta_2)\]</span> <span
@@ -718,13 +726,13 @@ <h3 id="the-simple-case-where-t1">The simple case where <span
718726
- \frac{1}{2} f_0^T V_{xx0} f_0 + \frac{1}{2} k_0^T Q_{uu0}^T k_0 -
719727
\frac{1}{2} f_1^T V_{xx1} f_1\]</span></p>
720728
<p>But they do not cancel out in the general case: <span
721-
class="math display">\[\begin{aligned}
729+
class="math display">\[\begin{align*}
722730
\Delta(\alpha) = \alpha \Big( V_{x0}^T f_0 + Q_{u0}^T k_0 + V_{x1}^T
723731
f_1
724732
+ \frac{\alpha}{2} ( - f_0^T V_{xx0} f_0 - f_1^T V_{xx1} f_1 + k_0^T
725733
Q_{uu0}^T k_0 ) \\
726-
+ (\alpha-1) ( f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 )
727-
\Big)\end{aligned}\]</span></p>
734+
+ (\alpha-1) ( f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 ) \Big)
735+
\end{align*}\]</span></p>
728736
<h2 id="extending-to-t1-by-recurence">Extending to <span
729737
class="math inline">\(T&gt;1\)</span> by recurence</h2>
730738
<p>We can now work by recurence to extend the exact same shape to <span
@@ -762,11 +770,11 @@ <h2 id="extending-to-t1-by-recurence">Extending to <span
762770
<h2 id="line-search-algorithm">Line-search algorithm</h2>
763771
<p>First, let us note that if all the gaps <span
764772
class="math inline">\(f_t\)</span> are null, it is simply: <span
765-
class="math display">\[\begin{aligned}
773+
class="math display">\[\begin{align*}
766774
\Delta(\alpha) &amp;= \alpha \big( \sum Q_u^T k + \frac{\alpha}{2} k^T
767775
Q_{uu} k \big) \\
768-
&amp;= \alpha(\frac{\alpha}{2} - 1) \sum \ Q_u^T\ Q_{uu}^{-1} \
769-
Q_u\end{aligned}\]</span> This is always negative.</p>
776+
&amp;= \alpha(\frac{\alpha}{2} - 1) \sum \ Q_u^T\ Q_{uu}^{-1} \ Q_u
777+
\end{align*}\]</span> This is always negative.</p>
770778
<h3 id="merit-function-...-or-not">Merit function ... or not</h3>
771779
<p>However, <span class="math inline">\(\Delta\)</span> can be positive
772780
(i.e. corresponds to an increase of the cost function) when some gap

doc/fddp/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ bib:
101101
$(GZIP) $< > $@
102102

103103
html:
104-
pandoc -f latex root.tex -o fddp.html --mathjax
104+
pandoc -f latex root.tex -o ../fddp.html --mathjax
105105

106106
# --->
107107
# --->

doc/fddp/root.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ \subsection{Problem definition}
4040
$$\min_{\xtraj,\utraj} \int_0^\Treal \ell(x(t),u(t),t) dt + \ell_\Treal(x(\Treal))$$
4141
$$s.t. \quad x(0) = f_0$$
4242
$$\quad \forall t \in [0,\Treal], \quad \dot{x}(t) = f(x(t),u(t),t))$$
43-
where $\xtraj: t \rightarrow x(t)$ is the state trajectory, $\utraj: t \rightarrow u(t)$ is the control trajectory, $\ell$ is the integral --running-- cost, $\ell_T$ is the terminal cost, $f_0$ is the initial state value, $f$ is the robot dynamics and $T$, the time interval, is fixed.
43+
where $\xtraj: t \rightarrow x(t)$ is the state trajectory, $\utraj: t \rightarrow u(t)$ is the control trajectory, $\ell$ is the integral --running-- cost, $\ell_T$ is the terminal cost, $f_0$ is the initial state value, $f$ is the robot dynamics and $\Treal$, the time interval, is fixed.
4444

4545
The decision variables are $\xtraj,\utraj$, both of infinite dimension.
4646
We approximate this problem using a discrete version of it, by following the so-called direct --discretize first, solve second -- approach.

0 commit comments

Comments
 (0)