Theory: Custom solutions

Problem statement

Not every prac­ti­cal­ly rele­vant optimi­zation pro­blem can be for­mu­la­ted as an LP, QP, etc., and sol­ved with exis­ting soft­ware. Howe­ver, if the pro­blem is con­vex or the goal is to impro­ve an exis­ting solu­ti­on wit­hout clai­ming glo­bal opti­ma­li­ty, the­re are still good chan­ces. The deve­lo­p­ment of cus­tom algo­rith­ms is indis­pensable in such cases.

KKT conditions

Under the hypo­the­sis that both the objec­ti­ve func­tion \(f(x)\) and the ine­qua­li­ties are twice dif­fe­ren­tia­ble and con­vex, and the equa­tions are affi­ne, the solu­ti­on to the optimi­zation problem

$$\begin{align}
\min_x &~~~ f(x) \\
\text{s.t.}&~~~ (Ax)_k = b_k ~~~~~~~~~~~ k =1, … ‚m_1 \\
& ~~~ g_l(x) \le 0 ~~~~~~~~~~~~~ l=1, … ‚m_2
\end{align}$$

is uni­que­ly defi­ned by a set of equa­tions. The­se are cal­led Karush-Kuhn-Tucker (KKT) con­di­ti­ons and sta­te that \(x^*, \lambda^*, \nu^*\) are opti­mal pri­mal and dual optimi­zation varia­bles if the sys­tem of equations

$$\begin{align}
\nabla f (x^*) + \sum_{k=1}^{m_1} \nu_k^* \nabla (Ax^*)_k + \sum_{l=1}^{m_2}\lambda^*_l \nabla g_l(x^*) &=0 \\
(Ax^*)_k — b_k &=0 ~~~~~ k=1, …, m_1 \\
g_l(x^*) &\le 0 ~~~~~ l=1, …, m_2 \\
\lambda^*_l &\ge 0 ~~~~~ l=1, …, m_2 \\
\lambda^*_kg_l(x^*) &=0 ~~~~~ l=1, …, m_2
\end{align}$$

are satis­fied [1, p. 244]. The five equa­tions reflect the opti­ma­li­ty con­di­ti­on con­cer­ning the objec­ti­ve func­tion, the fea­si­bi­li­ty of the pri­mal and dual varia­bles, as well as a com­ple­men­ta­ri­ty con­di­ti­on that cou­ples dual varia­bles and constraints.

Figu­re 1: Dif­fe­rent equa­tions from the KKT sys­tem and their interpretation.

Newtons method with constraints

In the KKT sys­tem, par­ti­cu­lar­ly the first and third rows are non­line­ar, and fin­ding a direct solu­ti­on is chal­len­ging. As descri­bed in the sec­tion on nume­rics, ine­qua­li­ties can be repla­ced by bar­ri­er func­tions, and the non­line­ar equa­ti­on can be resol­ved using the New­ton method. The updates for the pri­mal and dual variables

$$x_{k+1}=x_k + \Delta x ~~~~ \lambda_{k+1}= \lambda_k + \Delta \lambda ~~~~ \nu_{k+1}= \nu_k+\Delta \nu $$

are deri­ved from a sequence of equa­ti­on sys­tems for the search direc­tion \((\Delta x, \Delta \lambda, \Delta \nu)\). The sys­tems of equa­tions are inde­xed by a sequence of num­bers \(\mu_0, \mu_1, …\) approa­ching \(0\) and are given by [1, p. 610]:

$$\begin{bmatrix} \Delta f (x_k) + \sum_{l=1}^{m_2}\lambda_l\Delta
g_l(x_k) & \nabla g(x_k) & A^T \\
-\operatorname{diag}(\lambda)\nabla g^T(x_k) &
-\operatorname{diag}(g(x_k)) & 0 \\ A & 0& 0\end{bmatrix}
\begin{bmatrix} \Delta x \\ \Delta \lambda \\ \Delta \nu \end{bmatrix} = — \begin{bmatrix} \nabla f(x_k) + \nabla g(x_k)+ A^T \nu \\ -\operatorname{diag}(\lambda) g(x_k) — \mu_k \vec{1} \\ Ax‑b \end{bmatrix} $$

whe­re \(\vec{1}=[1, …, 1]^T\) , \( g(x_k)=[ g_1(x_k), … g_{m_2}(x_k)]^T\), \(\nabla g(x_k) =[ \nabla g_1 (x_k), … ‚\nabla g_{m_2}(x_k)]\) with \(\nabla g_l\) being the gra­di­ent of \(g_l\) and \(\Delta g_l(x_k))_{ij}=(\partial_{x_i} \partial_{x_j} g_l)(x_k)\). The term \( \operatorname{diag(v)}\) is the matrix with ent­ries of the vec­tor \(v\) on the dia­go­nal. Effec­ti­ve methods for sol­ving the equa­ti­on sys­tems lead to search direc­tions \((\Delta x, \Delta \lambda, \Delta \nu)\) and ulti­m­ate­ly to the solu­ti­on of the optimi­zation pro­blem asso­cia­ted with the KKT system.

Numeric stability

The abo­ve sys­tem of equa­tions gives the impres­si­on of pro­vi­ding a com­ple­te and imple­men­ta­ble sche­me for con­vex optimi­zation. Howe­ver, this is not the case. In prac­ti­ce, addi­tio­nal steps must be taken to ensu­re nume­ri­cal sta­bi­li­ty. Search direc­tions must be sca­led, cen­te­red, and cor­rec­ted, among other adjus­t­ments [2]. Wit­hout the­se mea­su­res, the New­ton method may face con­ver­gence issues. Addi­tio­nal­ly, in the pre­sence of mul­ti­ple local mini­ma, the rela­ti­onship bet­ween the start­ing point and the end­point of the New­ton method is far from obvious. See the fol­lo­wing illus­tra­ti­ons based on cal­cu­la­ti­ons from [3, p. 34] and [4, p. 21].

Figu­re 2: Con­ver­gence and diver­gence of the New­ton method. The upper gra­phics show the results of the New­ton method for sol­ving the equa­ti­on \(t(\sqrt{1+t^2})^{-1}=0\). The lower gra­phics illus­tra­te which of the three solu­ti­ons to the equa­ti­on \(x^3–1=0\) is rea­ched from which start­ing point.

Example

Opti­mal esti­ma­ti­on of cor­re­la­ti­ons in data is essen­ti­al for high-dimen­sio­nal sta­tis­tics rela­ted to sto­cha­stic pro­ces­ses. It can be con­side­red as maxi­mi­zing the likeli­hood of a Wis­hart-dis­tri­bu­ted ran­dom varia­ble, ulti­m­ate­ly lea­ding to the optimi­zation problem

\begin{align}
\min_{\gamma} & ~~~ \log \det C_{\gamma} + \langle S, C_{\gamma}^{-1} \rangle_F + r\left[- \log \det \gamma + \langle \operatorname{diag}(\lambda) ^{-1}, \gamma \rangle_F\right] \\
\text{s.t.} & ~~~ A \gamma =b \\
& ~~~ \gamma \succeq 0
\end{align}

[5, p. 193]. Here, \(\gamma\) is a posi­ti­ve semi­de­fi­ni­te matrix of optimi­zation varia­bles, \(\langle \cdot, \cdot\rangle_F\) deno­tes the Fro­be­ni­us inner pro­duct, \(S\) is an empi­ri­cal cova­ri­ance matrix, \(\lambda\) are eigenva­lues, and \(C_\gamma\) is a cova­ri­ance matrix con­s­truc­ted from \(\gamma\), with \(r\) being a regu­la­riza­ti­on coef­fi­ci­ent. The New­ton steps are expli­cit­ly cal­cu­la­ted as

\begin{align}
\gamma_{k+1} =& \gamma_{k}+\Delta \gamma_{1}- \gamma A^T(A\gamma A^T)^{-1}A\Delta \gamma_1 \\
&\text{ with } \Delta \gamma_1 = (1+r)^{-1}[(r‑1)\gamma +S_{\psi} — r\gamma \operatorname{diag}(\lambda)^{-1}\gamma] \\
& \text{ and } S_{\psi} \text{ a trans­for­med cova­ri­ance matrix}
\end{align}

and the optimi­zation can be imple­men­ted, for exam­p­le, in Python.

Figu­re 3: Exem­pla­ry imple­men­ta­ti­on of an optimi­zation in terms of Python code. Code exe­cu­ti­on can be initia­li­zed by inter­ac­ting with a gra­phi­cal user interface.

Implementation

Prac­ti­cal imple­men­ta­ti­ons of a cus­tom optimi­zation algo­rithm requi­re a relia­ble eco­sys­tem of nume­ri­cal pro­grams. The Python pro­gramming lan­guage pro­vi­des such an envi­ron­ment with Num­Py [6] and is cha­rac­te­ri­zed by easy code inter­pr­e­ta­bi­li­ty and the avai­la­bi­li­ty of user-fri­end­ly gra­phi­cal inter­faces. This ease of ent­ry is advan­ta­ge­ous for both deve­lo­pers and clients.

Code & Sources

Exam­p­le code: Theory_newton_convergence.py in our tuto­ri­al­fol­der

[1] Boyd, S., & Van­den­berg­he, L. (2004). Con­vex Optimi­zation. Cam­bridge: Cam­bridge Uni­ver­si­ty Press.

[2] Meh­ro­tra, S. (1992). On the imple­men­ta­ti­on of a pri­mal-dual inte­ri­or point method. SIAM Jour­nal on Optimi­zation. 2 (4): 575–601.

[3] Nes­terov, Y. (2013). Intro­duc­to­ry Lec­tures on Con­vex Optimi­zation: A Basic Cour­se. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.

[4] Stern­berg, S. (2014). Dyna­mi­cal Sys­tems. New York: Cou­rier Corporation.

[5] Butt, J.A. (2019). An RKHS approach to model­ling and infe­rence for spa­tio­tem­po­ral geo­de­tic data with appli­ca­ti­ons to ter­restri­al radar inter­fe­ro­me­try. Dis­ser­ta­ti­on ETH Zürich, Zürich.

[6] Har­ris, C.R., Mill­man, K.J., van der Walt, S.J., Gom­mers, R., Vir­ta­nen, P., Cor­na­peau, D., Wie­ser, E., Tay­lor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., van Kerkwi­jk, M.H., Brett, M., Hald­ane, A., Fer­nan­dez del Rio, J., Wie­be, M., Peter­son, P., Gerard-Mar­chant, P., Shepp­ard, K., Red­dy, T., Weckes­ser, W., Abba­si, H., Goh­le, C., & Oli­phant T.E. (2020). Array pro­gramming with Num­Py. Natu­re, Vol 585, No. 7825.