@@ -15,7 +15,8 @@ <h2 id="problem-definition">Problem definition</h2>
1515< span class ="math inline "> \(\ell_T\)</ span > is the terminal cost, < span
1616class ="math inline "> \(f_0\)</ span > is the initial state value, < span
1717class ="math inline "> \(f\)</ span > is the robot dynamics and < span
18- class ="math inline "> \(T\)</ span > , the time interval, is fixed.</ p >
18+ class ="math inline "> \(\mathbb{T}\)</ span > , the time interval, is
19+ fixed.</ p >
1920< p > The decision variables are < span
2021class ="math inline "> \(\underline{x},\underline{u}\)</ span > , both of
2122infinite dimension. We approximate this problem using a discrete version
@@ -404,11 +405,12 @@ <h2 id="solving-the-ddp-forward-pass-with-a-partial-step">Solving the
404405class ="math display "> \[\Delta x_0 = \alpha f_0 = \alpha \Delta
405406x_0^*\]</ span > Now, assuming the < span class ="math inline "> \(\Delta x_t
406407= \alpha \Delta x_t^*\)</ span > , we have: < span
407- class ="math display "> \[\begin{aligned }
408+ class ="math display "> \[\begin{align* }
408409 \forall t=0..T\!\!-\!\!1,\quad\quad\quad \Delta u_t &= -\alpha k_t
409410- K_t \alpha \Delta x_t^* = \alpha \Delta u_t^*\\
410411\Delta x_{t+1} &= F_x \alpha \Delta x_t^* + F_u \alpha \Delta u_t +
411- \alpha f_t = \alpha \Delta x_{t+1}\end{aligned}\]</ span > </ p >
412+ \alpha f_t = \alpha \Delta x_{t+1}
413+ \end{align*}\]</ span > < span class ="math inline "> \(\square\)</ span > </ p >
412414< h1 id ="a-feasibility-prone-ocp-solver-using-ddp "> A feasibility-prone
413415OCP solver using DDP</ h1 >
414416< p > So far we have detailed a method to solve a LQR program. Let’s now
@@ -607,11 +609,12 @@ <h2 id="expectation-model-in-underlinexunderlineu">Expectation model in
607609sum at each shooting node of the cost gradient times the change in < span
608610class ="math inline "> \(x\)</ span > and < span
609611class ="math inline "> \(u\)</ span > : < span
610- class ="math display "> \[\label{eq:d1_nogaps}
611- \Delta_1 = \sum_{t=0}^T L_{xt}^T x_t + L_{ut}^T u_t\]</ span > (to keep
612- the sum simpler, we treat < span class ="math inline "> \(T\)</ span >
613- similarly to the other nodes, by introducing < span
614- class ="math inline "> \(L_{uT} = 0\)</ span > ).</ p >
612+ class ="math display "> \[\begin{equation}
613+ \label{eq:d1_nogaps}
614+ \Delta_1 = \sum_{t=0}^T L_{xt}^T x_t + L_{ut}^T u_t
615+ \end{equation}\]</ span > (to keep the sum simpler, we treat < span
616+ class ="math inline "> \(T\)</ span > similarly to the other nodes, by
617+ introducing < span class ="math inline "> \(L_{uT} = 0\)</ span > ).</ p >
615618< h3 id ="linear-rollout "> Linear rollout</ h3 >
616619< p > The states and controls are obtained from a linear roll-out as: < span
617620class ="math display "> \[x_{t+1} = F_{xt} x_t + F_{ut} u_t +
@@ -622,31 +625,34 @@ <h3 id="linear-rollout">Linear rollout</h3>
622625class ="math inline "> \(F_{t} = F_{xt} + F_{ut} K_t\)</ span > and < span
623626class ="math inline "> \(c_{t+1} = F_{ut} k_{t} + f_{t+1}\)</ span > (with
624627< span class ="math inline "> \(c_0 = f_0\)</ span > ). And finally: < span
625- class ="math display "> \[\begin{aligned }
628+ class ="math display "> \[\begin{align }
626629 x_t &= F_{t-1} ... F_0 c_0 + F_{t-1} ... F_1 c_1 + ... + F_{t-1}
627630c_{t-1} + c_t \\
628- &= \sum_{i=0}^t F_{t-1} ... F_i c_i
629- \label{eq:lroll}\ end{aligned }\]</ span > </ p >
631+ &= \sum_{i=0}^t F_{t-1} ... F_i c_i \label{eq:lroll}
632+ \end{align }\]</ span > </ p >
630633< h3 id ="first-order-model-delta_1 "> First-order model < span
631634class ="math inline "> \(\Delta_1\)</ span > </ h3 >
632635< p > Replacing < span class ="math inline "> \(u_t\)</ span > by < span
633636class ="math inline "> \(k_t + K_t x_t\)</ span > , the first-order term is:
634- < span class ="math display "> \[\label{eq:d1}
637+ < span class ="math display "> \[\begin{equation}
638+ \label{eq:d1}
635639 \Delta_1 = \sum_{t=0}^T (L_{xt} + K_t^T L_{ut}) ^T x_t + \sum_{t=0}^T
636- L_{ut}^T k_t\]</ span > where we denote < span class ="math inline "> \(l_t =
637- L_{xt} + K_t^T L_{ut}\)</ span > to simplify the notation. Putting < a
640+ L_{ut}^T k_t
641+ \end{equation}\]</ span > where we denote < span class ="math inline "> \(l_t
642+ = L_{xt} + K_t^T L_{ut}\)</ span > to simplify the notation. Putting < a
638643href ="#eq:lroll " data-reference-type ="eqref "
639644data-reference ="eq:lroll "> [eq:lroll]</ a > in < a href ="#eq:d1 "
640645data-reference-type ="eqref " data-reference ="eq:d1 "> [eq:d1]</ a > , we get:
641- < span class ="math display "> \[\begin{aligned }
646+ < span class ="math display "> \[\begin{align }
642647 \Delta_1 &= \sum_{t=0}^{T} l_t \sum_{i=0}^{t} F_{t-1} ... F_i c_i
643648+ L_{ut}^T k_t \\
644649 & = \sum_{i=0}^{T} c_i^T \sum_{t=i}^{T} F_t^T ... F_T^T l_t +
645- k_i^T L_{ui}\end{aligned}\]</ span > Each term of the sum is composed of a
646- product of < span class ="math inline "> \(f_i\)</ span > and a product of
647- < span class ="math inline "> \(k_i\)</ span > , and can then be evaluated from
648- the result of the backward pass. Let’s exhibit these 2 terms. The term
649- in < span class ="math inline "> \(f_i\)</ span > is: < span
650+ k_i^T L_{ui}
651+ \end{align}\]</ span > Each term of the sum is composed of a product of
652+ < span class ="math inline "> \(f_i\)</ span > and a product of < span
653+ class ="math inline "> \(k_i\)</ span > , and can then be evaluated from the
654+ result of the backward pass. Let’s exhibit these 2 terms. The term in
655+ < span class ="math inline "> \(f_i\)</ span > is: < span
650656class ="math display "> \[\Delta_{ft} = F_i^T ... F_T^T l_i = L_{xi} +
651657F_{xi}^T \Delta_{fi+1} + K_i^T (L_{ui} + F_{ui} \Delta_{fi+1})\]</ span >
652658The term in < span class ="math inline "> \(k_i\)</ span > is: < span
@@ -678,12 +684,13 @@ <h3 id="the-simple-case-where-t1">The simple case where <span
678684the Value and Hamiltonian functions.</ p >
679685< p > In the case where we only consider one control < span
680686class ="math inline "> \(u_0\)</ span > , the expectation model is: < span
681- class ="math display "> \[\begin{aligned }
687+ class ="math display "> \[\begin{align* }
682688 \Delta_1 &= L_{x0}^T x_0 + L_{u0}^T u_0 + L_{x1}^T x1 \\
683689 &= L_{0}^T f_0 + L_{u0} k_0 + L_{x1} F_{0} f_0 + L_{x1} F_{u0}
684690k_0 + L_{x1} f_1 \\
685691 &= (L_0 + F_0^T L_{x1})^T f_0 + (L_{u0} + F_{u0}^T L_{x1})^T k_0 +
686- L_{x1} f_1\end{aligned}\]</ span > We nearly recognize the gradients < span
692+ L_{x1} f_1
693+ \end{align*}\]</ span > We nearly recognize the gradients < span
687694class ="math inline "> \(V_{x0}, Q_{u0}, V_{x1}\)</ span > respectively in
688695factor of < span class ="math inline "> \(f_0,k_0,f_1\)</ span > , but some
689696terms are missing: < span class ="math display "> \[V_{x0} = L_0 + F_0^T
@@ -693,23 +700,24 @@ <h3 id="the-simple-case-where-t1">The simple case where <span
693700f_1\]</ span > Basically, the missing terms correspond to the
694701re-linearization of the gradient at the < span
695702class ="math inline "> \(f_t\)</ span > points at the end of the intervals.
696- Then, we get: < span class ="math display "> \[\begin{aligned }
703+ Then, we get: < span class ="math display "> \[\begin{align* }
697704 \Delta_1 &= V_{x0}^T f_0 + Q_{u0}^T k_0 + V_{x1}^T f_1 - \left(
698705f_0^T V_{xx0} f_0 + f_0^T F_0^T L_{xx1} f_1 + k_0^T F_{u0} L_{xx1} f_1
699706+ f_1^T V_{xx1} f_1\right) \\
700707 &= V_{x0}^T f_0 + Q_{u0}^T k_0 + V_{x1}^T f_1 - \left( f_0^T
701- V_{xx0} x_0 + f_1^T V_{xx1} x_1 \right)\end{aligned}\] </ span > </ p >
702- < p > The second-order term is: < span
703- class ="math display "> \[\begin{aligned }
708+ V_{xx0} x_0 + f_1^T V_{xx1} x_1 \right)
709+ \end{align*}\] </ span > </ p >
710+ < p > The second-order term is: < span class ="math display "> \[\begin{align* }
704711 \Delta_2 &= f_0^T V_{xx0} f_0 + k_0^T Q_{uu0} k_0 + f_1^T V_{xx1}
705712f_1 + 2(f_0^T F_0^T L_{xx1} f_1 + k_0^T F_{u0} L_{xx1} f_1) \\
706713 &= f_0^T V_{xx0} f_0 + k_0^T Q_{uu0} k_0 + f_1^T V_{xx1} f_1 +
7077142\big(f_1^T V_{xx1} (x_1-f_1) \big) \\
708715 &= -f_0^T V_{xx0} f_0 + k_0^T Q_{uu0} k_0 - f_1^T V_{xx1} f_1 +
709- 2\big(f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 \big)\end{aligned}\]</ span >
710- We can recognize in the additional terms (the 2 last ones) the same
711- terms as in < span class ="math inline "> \(\Delta_1\)</ span > . Nicely, they
712- will cancel out in the case we make a full step < span
716+ 2\big(f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 \big)
717+ \end{align*}\]</ span > We can recognize in the additional terms (the 2
718+ last ones) the same terms as in < span
719+ class ="math inline "> \(\Delta_1\)</ span > . Nicely, they will cancel out in
720+ the case we make a full step < span
713721class ="math inline "> \(\alpha=1\)</ span > : < span
714722class ="math display "> \[\Delta(\alpha) = \alpha(
715723\Delta_1+\frac{\alpha}{2} \Delta_2)\]</ span > < span
@@ -718,13 +726,13 @@ <h3 id="the-simple-case-where-t1">The simple case where <span
718726- \frac{1}{2} f_0^T V_{xx0} f_0 + \frac{1}{2} k_0^T Q_{uu0}^T k_0 -
719727\frac{1}{2} f_1^T V_{xx1} f_1\]</ span > </ p >
720728< p > But they do not cancel out in the general case: < span
721- class ="math display "> \[\begin{aligned }
729+ class ="math display "> \[\begin{align* }
722730 \Delta(\alpha) = \alpha \Big( V_{x0}^T f_0 + Q_{u0}^T k_0 + V_{x1}^T
723731f_1
724732+ \frac{\alpha}{2} ( - f_0^T V_{xx0} f_0 - f_1^T V_{xx1} f_1 + k_0^T
725733Q_{uu0}^T k_0 ) \\
726- + (\alpha-1) ( f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 )
727- \Big)\ end{aligned }\]</ span > </ p >
734+ + (\alpha-1) ( f_0^T V_{xx0} x_0 + f_1^T V_{xx1} x_1 ) \Big)
735+ \end{align* }\]</ span > </ p >
728736< h2 id ="extending-to-t1-by-recurence "> Extending to < span
729737class ="math inline "> \(T>1\)</ span > by recurence</ h2 >
730738< p > We can now work by recurence to extend the exact same shape to < span
@@ -762,11 +770,11 @@ <h2 id="extending-to-t1-by-recurence">Extending to <span
762770< h2 id ="line-search-algorithm "> Line-search algorithm</ h2 >
763771< p > First, let us note that if all the gaps < span
764772class ="math inline "> \(f_t\)</ span > are null, it is simply: < span
765- class ="math display "> \[\begin{aligned }
773+ class ="math display "> \[\begin{align* }
766774 \Delta(\alpha) &= \alpha \big( \sum Q_u^T k + \frac{\alpha}{2} k^T
767775Q_{uu} k \big) \\
768- &= \alpha(\frac{\alpha}{2} - 1) \sum \ Q_u^T\ Q_{uu}^{-1} \
769- Q_u \end{aligned }\]</ span > This is always negative.</ p >
776+ &= \alpha(\frac{\alpha}{2} - 1) \sum \ Q_u^T\ Q_{uu}^{-1} \ Q_u
777+ \end{align* }\]</ span > This is always negative.</ p >
770778< h3 id ="merit-function-...-or-not "> Merit function ... or not</ h3 >
771779< p > However, < span class ="math inline "> \(\Delta\)</ span > can be positive
772780(i.e. corresponds to an increase of the cost function) when some gap
0 commit comments