Optimal control: Stochastic optimal control

Definition

A sys­tem includes con­trol inputs that can be cho­sen within cer­tain limits and is to be sta­bi­li­zed through the­se inputs. The dyna­mics of the sys­tem, i.e., the influence of con­trol input and cur­rent sta­te on the future sta­te, are known, but the sys­te­m’s pro­gres­si­on is still influen­ced by ran­dom disturbances.

This is an exten­si­on of opti­mal con­trol with known dyna­mics and sta­te to include ran­dom dis­tur­ban­ces. The Line­ar-Qua­dra­tic Gaus­si­an (LQG) con­trol pro­blem is an exam­p­le of this pro­blem class, whe­re the dis­tur­ban­ces are nor­mal­ly distributed.

LQG control

The dyna­mic model in the LQG can be writ­ten as
$$ x_{k+1} = Ax_k + Bu_k + w_k $$
with \(x_k, x_{k+1}\) typi­cal­ly being vec­tor-valued sta­tes at times \(t=k, k+1\), \(u_k\) the con­trol inputs, \(w_k\) the ran­dom dis­tur­ban­ces, and \(A, B\) matri­ces. The con­trol inputs are to be cho­sen such that sta­bi­liza­ti­on of the sys­tem is achie­ved.
Given that the sys­tem dyna­mics matri­ces \(A, B\) are known, the varia­bi­li­ty of \(w\) quan­ti­fia­ble by the cova­ri­ance matrix \(W\), and the con­trol input \(u_k, k=1, … T \) is to be cho­sen such that the avera­ge ener­gy is mini­mi­zed. For­mal­ly, the optimi­zation pro­blem is
$$\min_{x_1, …, x_T, u_1, …, u_T} \lim_{T\rightarrow \infty} \frac{1}{T} E\left[ \sum_{k=0}^T \|x_k\|^2_Q + \|u_k\|^2_R\right] $$
whe­re \(\|x_k\|^2_Q=x_k^TQx_k\) and \(\|u_k\|^2_R=u_k^TRu_k\) are mea­su­res for the cos­ts of sta­te \(x_k\) and con­trol input \(u_k\), and \(E[\cdot]\) is the expec­ta­ti­on value.

SDP Formulation

The opti­mal con­trol input \(u_k\) is deter­mi­ned by the equa­ti­on [1] \(u_k=Z_{xu}^T(Z_{xx})^{-1} x_k\) whe­re \(Z_{xx}\) and \(Z_{xu}\) are solu­ti­ons to the fol­lo­wing semi­de­fi­ni­te pro­gram [2].

$$\begin{align} \min_{Z_{xx}, Z_{xu}, Z_{uu}} ~~~&\operatorname{tr} (QZ_{xx}) + \operatorname{tr} (R Z_{uu}) \\ s.t. ~~~& \begin{bmatrix} Z_{xx} & Z_{xu} \\ Z_{xu}^T & Z_{uu} \end{bmatrix} \succeq 0 \\ & Z_{xx}-AZ_{xx}A^T‑AZ_{xu}B^T — BZ_{xu}^TA^T — B Z_{uu}B^T=W\end{align}$$

The func­tion to be mini­mi­zed, \(\operatorname{tr} (QZ_{xx}) + \operatorname{tr} (R Z_{uu})\), quan­ti­fies expec­ted cos­ts (money, time, ener­gy, errors) through the rela­ti­onship $$E[X^TQx+u^TRu]=\operatorname{tr}(Q E[xx^T])+\operatorname{tr}(RE[uu^T])=\operatorname{tr}(QZ_{xx})+\operatorname{tr}(RZ_{uu}).$$ Thus, \(Z_{xx}\) and \(Z_{uu}\) can be inter­pre­ted as cova­ri­ance matri­ces of the ran­dom­ly dis­tri­bu­ted varia­bles \(x\) and \(u\), and the equa­ti­on \(u_k=Z_{xu}^T(Z_{xx})^{-1}x_k\) is pre­cis­e­ly the con­di­tio­nal expec­ta­ti­on of \(u_k\) given \(x_k\) and the pro­ba­bi­li­ty dis­tri­bu­ti­on deter­mi­ned by \(Z_{xx}, Z_{xu}, Z_{uu}\) that mini­mi­zes energy.

Example: Pendulum control

In the fol­lo­wing detail­ed exam­p­le, the objec­ti­ve is to sta­bi­li­ze a pen­dulum affec­ted by ran­dom effects. We assu­me that a pen­dulum starts from a ran­dom (but small) devia­ti­on \(\varphi\) from the equi­li­bri­um position.

The pen­dulum moti­on then fol­lows nor­mal phy­si­cal laws, with the excep­ti­on that addi­tio­nal ran­dom effects, neither pre­dic­ta­ble nor con­troll­able by us, cau­se a force and thus an acce­le­ra­ti­on on the pendulum.

Drawing_OC_zufaellige_Effekte_1
Figu­re 1: Illus­tra­ti­on of the pen­dulum dyna­mics by show­ing the tem­po­ral evo­lu­ti­on of the dis­pla­ce­ment ang­le \(\varphi\), and its time deri­va­ti­ves \(\dot{\varphi}\) and \(\ddot{\varphi}\). The first two graphs (a) show the sys­tem evo­lu­ti­on in the absence of ran­dom effects, while the last graphs (b) show the sys­tem evo­lu­ti­on in the pre­sence of ran­dom effects on \(\ddot{\varphi}\).

Note that the graph shows the expec­ted beha­vi­or for pen­dulum oscil­la­ti­ons. When the dis­pla­ce­ment \(\varphi\) is maxi­mal, then the velo­ci­ty \(\dot{\varphi}=0\) and the acce­le­ra­ti­on \(\ddot{\varphi}\) is mini­mal. When the dis­pla­ce­ment \(\varphi=0\), the velo­ci­ty \(\dot{\varphi}\) is maxi­mal, and the acce­le­ra­ti­on \(\ddot{\varphi}\) swit­ches from posi­ti­ve to nega­ti­ve or vice versa.

Pendulum equations

The acce­le­ra­ti­on indu­ced by the gra­vi­ta­tio­nal force \(g\) in the direc­tion of oscil­la­ti­on is \(-g\sin(\varphi)\). Fur­ther­mo­re, ran­dom effects \(w\) are acting, and we assu­me that the con­trol input \(u\) we choo­se direct­ly affects the acce­le­ra­ti­on, such that

$$\ddot{\varphi}_k = ‑g \sin(\varphi_k) + w_k + u_k ~~~~~~~k=0, …, T.$$

The sta­te equa­ti­on cou­pling dif­fe­rent quan­ti­ties over suc­ces­si­ve time steps can be writ­ten as

$$ \underbrace{\begin{bmatrix} \varphi_{k+1} \\ \dot{\varphi}_{k+1} \\ \ddot{\varphi}_{k+1} \\ \sin(\varphi_{k+1}) \\ \cos(\varphi_{k+1}) \end{bmatrix}}_{x_{k+1}} = \underbrace{\begin{bmatrix} 1 & \Delta t & 0 & 0& 0 \\ 0 & 1 & \Delta t & 0 & 0 \\ 0& 0& 0& ‑g & 0 \\ 0 & \Delta t & 0 & 1 & 0 \\ 0&0&0&0&0\end{bmatrix}}_{A}\underbrace{\begin{bmatrix} \varphi_{k} \\ \dot{\varphi}_{k} \\
\ddot{\varphi}_{k} \\ \sin(\varphi_{k}) \\ \cos(\varphi_{k})
\end{bmatrix}}_{x_{k}} + \underbrace{\begin{bmatrix} 0 \\ 0\\ 1 \\ 0\\0 \end{bmatrix}}_{B} \underbrace{\begin{bmatrix} u_k\end{bmatrix}}_{u_k} + \underbrace{\begin{bmatrix} 0\\0\\w_k\\0\\0 \end{bmatrix}}_{w_k}.$$

The appro­xi­ma­ti­ons valid for small angles

$$\begin{align} \sin (\varphi+\Delta \varphi)&\approx \sin\varphi+\Delta \varphi \cos \varphi\approx \sin \varphi + \Delta t \dot{\varphi} \cos \varphi \approx \sin \varphi + \Delta t \dot{\varphi} \\ \cos(\varphi+\Delta\varphi) &\approx \cos \varphi — \Delta \varphi \sin \varphi \approx \cos \varphi — \Delta t \dot{\varphi} \sin \varphi \approx \cos \varphi \end{align}$$

are used to crea­te the last two rows of the equation.

Solution

To find a matrix \(K\) such that the con­trol input \(u_k=Kx_k\) sta­bi­li­zes the sys­tem, the optimi­zation problem

$$\begin{align} \min_{Z_{xx}, Z_{xu}, Z_{uu}} ~~~& \sum_{i=1}^3 (Z_{xx})_{ii} +(Z_{uu})_{ii} \\ s.t. ~~~& \begin{bmatrix} Z_{xx} & Z_{xu} \\ Z_{xu}^T & Z_{uu} \end{bmatrix} \succeq 0 \\ & Z_{xx}-AZ_{xx}A^T‑AZ_{xu}B^T — BZ_{xu}^TA^T — B Z_{uu}B^T=W\end{align}$$

needs to be sol­ved. Here, \(A\) and \(B\) are as descri­bed in the pre­vious sec­tion, \(w\) is a \(5 \times 5\) matrix of \(0\)s except for the ent­ry \(W_{33}=\sigma_w^2\) repre­sen­ting the vari­ance of the ran­dom effect on acce­le­ra­ti­on, and the resul­ting con­trol signal is \(u_k= Kx_k=Z_{xu}^T(Z_{xx})^{-1}x_k\).

Figu­re 2: The results of the optimi­zation are the matri­ces \(Z_{XU}, Z_{XX}, K\) for deri­ving the opti­mal con­trol signal \(u_k\) given the sta­te \(x_k\). The two graphs illus­tra­te the evo­lu­ti­on of a pen­dulum con­trol­led accor­ding to this sche­me and influen­ced by ran­dom effects.

Practical aspects

To adapt the abo­ve solu­ti­on approach to other sys­tems, seve­ral con­side­ra­ti­ons must be kept in mind. The sys­tem dyna­mics must be line­ar, thus for­mu­lable as $$x_{k+1}=Ax_k+Bu_k+w_k$$ whe­re the \(w_k\) must be inde­pen­dent ran­dom varia­bles with a known cova­ri­ance matrix. If the \(w_k\) are cor­re­la­ted with each other, the enti­re optimi­zation equa­ti­on must be embedded into a hig­her-dimen­sio­nal space that incor­po­ra­tes both \(x_1, …, x_k\) and \(w_1, …, w_k\) simultaneously.

Even in the pen­dulum exam­p­le, the sta­te \(x_k\) was 5‑dimensional with two auxi­lia­ry dimen­si­ons docu­men­ting \(\sin \varphi_k\) and \(\cos \varphi_k\). The defi­ni­ti­on of the sta­te vec­tor must be con­duc­ted by the user them­sel­ves and can beco­me quite com­pli­ca­ted. Often, tricks and appro­xi­ma­ti­ons are neces­sa­ry to trans­form non­line­ar equa­tions into a linea­ri­zed form that is useful­ly applicable.

Code & Sources

Exam­p­le code: OC_stochastic_control_1.py , OC_stochastic_control_2.py  in our tuto­ri­al­fol­der.

[1] Bal­a­krish­n­an, V.,  & Van­den­berg­he, L. (2003). Semi­de­fi­ni­te pro­gramming dua­li­ty and line­ar time-inva­ri­ant sys­tems. IEEE Tran­sac­tions on Auto­ma­tic Con­trol, 48,(1),  30–41.

[2] Kam­gar­pour, M., Sum­mers, T. (2017). On infi­ni­te dimen­sio­nal line­ar pro­gramming approach to sto­cha­stic con­trol. 20th IFAC World Con­gress (IFAC 2017), Tou­lou­se, France, July 9–14. IFAC-Paper­sOn­Line, 50, (1), 6148 — 6153.