Optimal control: Overview

Definition

The term opti­mal con­trol encom­pas­ses the theo­ry and methods for the opti­mal con­trol of sys­tems. Con­trol signals are to be cho­sen in such a way that a spe­ci­fic sys­tem sta­te is achie­ved with mini­mal cost; opti­mal con­trol often exhi­bits high­ly dif­fi­cult pro­blems due to its acti­ve and sequen­ti­al nature.

Alt­hough the sys­tems are often of a phy­si­cal tech­ni­cal natu­re for his­to­ri­cal reasons of deve­lo­p­ment, they can also repre­sent busi­ness manage­ment, eco­no­mic, or infor­ma­ti­on tech­no­lo­gy pro­ces­ses. The con­trol signals are then, for exam­p­le, dyna­mic resour­ce flows, fis­cal mea­su­res, or memo­ry allo­ca­ti­on rules. Com­mon to all pro­blem con­stel­la­ti­ons is the goal of making opti­mal decis­i­ons under uncertainty.

Opti­mal con­trol dif­fers from opti­mal esti­ma­ti­on and opti­mal design due to the pre­sence of an oppor­tu­ni­ty for acti­ve con­trol of the inves­ti­ga­ted sys­tem. The­r­e­fo­re, opti­mal con­trol pro­blems often have a tem­po­ral or at least sequen­ti­al aspect, rai­sing ques­ti­ons about the sys­te­m’s dyna­mics and its controllability.

Example

A remo­te-con­trol­led car is to be dri­ven from posi­ti­on \(x=0\) to posi­ti­on \(x=1\) in time \(T\). The acce­le­ra­ti­on \(\ddot{x}\) can be influen­ced within cer­tain limits by the con­trol signal \(u\). It should be cho­sen over the enti­re dura­ti­on \(T\) such that the total ener­gy con­sump­ti­on \(\sum_{t=0}^T \ddot{x}_t^2\) is minimized.

Figu­re 1: Illus­tra­ti­on of an opti­mal con­trol pro­blem, whe­re a tem­po­ral­ly varia­ble acce­le­ra­ti­on must be adjus­ted so that a vehic­le mini­mi­zes the total ener­gy used to reach a tar­get position.
The optimi­zation pro­blem takes the form
 $$ \begin{align} \min_{x_0, …, x_T, u_0, …, u_T} ~~~& \sum_{t=0}^T\ddot{x}_t^2 \\ \text{s.t.} ~~~& x_{t+1}=Ax_t+Bu_t\\~~~&x_0=[0,0,0]^T ~~~~~~~~~ x_T=[1,0,0]^T \\  ~~~& A=\begin{bmatrix} 1 & \Delta t & 0 \\ 0 & 1 & \Delta t \\ 0 & 0 & 1\end{bmatrix} ~~~~ B=\begin{bmatrix} 0 \\ 0 \\ \Delta t \end{bmatrix} \end{align}$$

Classes of problems

The abo­ve exam­p­le is cha­rac­te­ri­zed by a high degree of pre­dic­ta­bi­li­ty and requi­res pre­cise know­ledge of the dyna­mic rela­ti­onships as well as an absence of ran­dom effects. In prac­ti­ce, this is not always the case, which is why for­mu­la­ti­ons of the opti­mal con­trol pro­blem have also been deve­lo­ped for more com­plex situations.

Ins­tead of a pre­de­fi­ned plan \(u_0, …, u_T\), cur­rent obser­va­tions of the actu­al sta­te must be incor­po­ra­ted into the con­trol, resul­ting in a feed­back loop [2, p. 102].

Figu­re 2: Illus­tra­ti­on of a feed­back loop for sys­tem con­trol. The obser­ved sta­te \(x_{t+1}\) is used to deter­mi­ne the con­trol signal \(u_{t+1}\).

Then, the func­tion \(K(x_t)\) must be found such that \(J\), as a mea­su­re of the ina­de­quacy of the sequence \(f(x_0,u_0), …, f(x_T,u_T)\), is minimized.

Class: Robust optimal control

Robust opti­mal con­trol extends the optimi­zation of con­trol signals to situa­tions whe­re the rela­ti­onship bet­ween sys­tem sta­tes \(x_t\), con­trol varia­bles \(u_t\), and the sub­se­quent sys­tem sta­te \(x_{t+1}\) is not ful­ly known.

In the abo­ve exam­p­le, this cor­re­sponds to the case whe­re, for ins­tance, due to slip­pa­ge or power trans­mis­si­on pro­blems, the matri­ces \(A\) and \(B\) con­tain par­ti­al­ly unknown ele­ments [1, p. 428].

A con­trol input \(K(x_t) = u_t\) must then be found that satis­fac­to­ri­ly con­trols the sys­tem
$$ x_{t+1} = Ax_t + Bu_t ~~~~~ A \in \mathcal{A}, B \in \mathcal{B} $$

regard­less of the spe­ci­fic matri­ces \(A \in \mathcal{A}\) and \(B \in \mathcal{B}\). This amounts to the simul­ta­neous satis­fac­tion of seve­ral line­ar matrix ine­qua­li­ties and results in a semi­de­fi­ni­te program.

Class: Stochastic optimal control

In sto­cha­stic opti­mal con­trol, the sys­te­m’s evo­lu­ti­on is accept­ed as ran­dom, and ins­tead of direct­ly available sys­tem sta­tes \(x_t\), only noi­sy obser­va­tions are accessible.

For exam­p­le, in the abo­ve exam­p­le with the remo­te-con­trol­led vehic­le, if the­re were ran­dom uncon­troll­able effects such as wind or inac­cu­ra­ci­es in signal trans­mis­si­on, then the sys­tem rela­ti­onships could be descri­bed as follows.

$$ \begin{align} x_{t+1}&=Ax_t+Bu_t+w\\ y_t&=Cx_t+v \end{align}$$

Here, \(x_t, A, B, u_t\) are as befo­re, \(y_t=Cx_t\) are the obser­va­tions of the sta­tes, and \(w\) and \(v\) are ran­dom varia­bles [1, p. 428], [3]. If only pro­ba­bi­li­ties of sta­te chan­ges are known and pos­si­bly non­linea­ri­ties exist, then Mar­kov decis­i­on pro­ces­ses can be used for pro­blem formulation.

Class: Reinforcement learning

If neither dyna­mic rela­ti­onships nor sta­te tran­si­ti­on pro­ba­bi­li­ties are known, but the sys­tem to be con­trol­led can in prin­ci­ple be simu­la­ted using a com­pu­ter model, then rein­force­ment lear­ning can be applied.

This situa­ti­on, in the con­text of the abo­ve exam­p­le with the remo­te-con­trol­led car, would cor­re­spond to a sce­na­rio whe­re no a prio­ri infor­ma­ti­on about the beha­vi­or of the sys­tem sta­te and con­trol input is available.

In rein­force­ment lear­ning, the sys­tem beha­vi­or is ins­tead empi­ri­cal­ly explo­red by exe­cu­ting con­trol sug­ges­ti­ons on a tri­al-and-error basis and obser­ving their effects. This leads to a set of data used to train neu­ral net­works. Their task is to repli­ca­te an opti­mal func­tion \(u_t=K(x_t)\), which sug­gests the best action \(u_t\) for each sta­te \(x_t\).

Due to the use of neu­ral net­works, the con­trol inputs are of a non­line­ar natu­re, and the approach is very data-hun­gry wit­hout gua­ran­te­e­ing opti­ma­li­ty. Howe­ver, this approach makes it pos­si­ble to sol­ve pro­blems for which the­re is no hope with con­ven­tio­nal methods. Examp­les include auto­no­mous dri­ving or the deve­lo­p­ment of self-lear­ning algo­rith­ms for the move­ment of robots or for con­trol­ling com­pu­ter oppon­ents in games [4].

Solutions

Depen­ding on the class of the opti­mal con­trol pro­blem to be sol­ved, it ran­ges from tri­vi­al to impos­si­ble to sol­ve. Purely deter­mi­ni­stic plan­ning pro­blems can often be sol­ved through search algo­rith­ms or line­ar pro­gramming (LP). The pre­sence of cle­ar­ly defi­ned ran­dom effects or model uncer­tain­ties requi­res the gene­ra­ti­on of sta­bi­li­zing feed­back loops and the mini­miza­ti­on of error vari­ances, neces­si­ta­ting the use of semi­de­fi­ni­te pro­gramming (SDP). If no good pro­cess models are available, Mar­kov decis­i­on pro­ces­ses can be used for sto­cha­stic mode­ling, or rein­force­ment lear­ning is appli­ed directly.

Ite­ra­ti­ve solu­ti­on heu­ristics and indi­vi­du­al exami­na­ti­on of the pro­po­sed solu­ti­ons are then ine­vi­ta­ble. The methods are part­ly expe­ri­men­tal and defi­ne the limit of what is curr­ent­ly achie­va­ble sci­en­ti­fi­cal­ly. A solu­ti­on for an opti­mal con­trol pro­blem is then cal­led a poli­cy — a direc­ti­ve for every pos­si­ble situa­ti­on, indi­ca­ting which beha­vi­or is optimal.

Figu­re 3: 2‑dimensional vari­ant of the pro­blem for moti­on plan­ning of a remo­te-con­trol­led vehic­le. Shown are the \(x,y\) coor­di­na­te values (a) and speeds (b) nee­ded for the most ener­gy-effi­ci­ent trans­fer of a mova­ble object to the cen­ter \(\otimes\). The dif­fe­rent cur­ves cor­re­spond to dif­fe­rent start­ing posi­ti­ons; the initi­al acce­le­ra­ti­on is \([0.2, 0.2]\).

Applications

Opti­mal con­trol offers count­less appli­ca­ti­on pos­si­bi­li­ties. The­se include tech­ni­cal appli­ca­ti­ons such as the regu­la­ti­on of che­mi­cal reac­tions, con­trol of robo­tic arms, or the coor­di­na­ti­on of har­bor cra­ne arms. Logi­stic pro­blems such as adap­ti­ve traf­fic light con­trol, adap­ti­ve sche­du­ling of over­ti­me, pati­ent appoint­ments, ambu­lan­ce deploy­ments, vehic­le rou­ting, and many issues from pro­duc­tion plan­ning and sup­p­ly chain management.

The tran­si­ti­on to finan­cial or macroe­co­no­mic appli­ca­ti­ons, such as the pri­cing of finan­cial deri­va­ti­ves, stock port­fo­lio manage­ment, and the con­trol of eco­no­mic acti­vi­ty levels, is seam­less. Rein­force­ment lear­ning is con­side­red a prac­ti­cal­ly appli­ca­ble method of arti­fi­ci­al intel­li­gence. More examp­les can be found here.

In prac­ti­cal terms, the­re are many poten­ti­al pit­falls with opti­mal con­trol pro­blems. Asi­de from the dif­fi­cul­ty of for­mu­la­ting a spe­ci­fic goal as a con­vex opti­ma­li­ty cri­ter­ion, ana­ly­ses must be con­duc­ted on how sta­ble the deve­lo­ped con­trols are against ran­dom devia­ti­ons. It is also not gua­ran­teed that a desi­red sta­te can be achie­ved using the available con­trol options.

Code & Sources

Exam­p­le code: OC_vehicle_control_1.py , OC_vehicle_control_2.py  in our tuto­ri­al­fol­der.

[1] Wol­ko­wicz, H., Sai­gal, R., & Van­den­berg­he, L. (2012). Hand­book of Semi­de­fi­ni­te Pro­gramming: Theo­ry, Algo­rith­ms, and Appli­ca­ti­ons. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.

[2] Ander­son, B. D. O., & Moo­re, J. B. (2007). Opti­mal Con­trol: Line­ar Qua­dra­tic Methods. New York: Cou­rier Corporation.

[3] Bal­a­krish­n­an, V.,  & Van­den­berg­he, L. (2003). Semi­de­fi­ni­te pro­gramming dua­li­ty and line­ar time-inva­ri­ant sys­tems. IEEE Tran­sac­tions on Auto­ma­tic Con­trol, 48,(1),  30–41.

[4] Vin­yals, O., Babusch­kin, I., Czarne­cki, W.M. et al. (2019). Grand­mas­ter level in Star­Craft II using mul­ti-agent rein­force­ment lear­ning. Natu­re, 575, 350–254.