Theory: Numerics and solver

Definition

Nume­ri­cal ana­ly­sis deals with the algo­rith­mic com­pu­ter-aided solu­ti­on of spe­ci­fic pro­blems in appli­ed mathe­ma­tics. The algo­rith­ms desi­gned in nume­ri­cal ana­ly­sis lead in a fini­te num­ber of steps from stan­dar­di­zed pro­blem for­mu­la­ti­ons to an appro­xi­ma­te solution.

The imple­men­ta­ti­on of an algo­rithm for sol­ving optimi­zation pro­blems in the form of exe­cu­ta­ble pro­gram code is cal­led a sol­ver.

Simple brute-force approaches

Solu­ti­on algo­rith­ms for pro­blems of the form \(\min_x f(x) : x \in D\) are desi­gned to gene­ra­te a sequence \(x_1, …, x_m \in D\) such that \(f(x_m) — f(x^*) \le \epsilon\), whe­re \(x^*\) is the true opti­mum and \(\epsilon\) is a small posi­ti­ve num­ber spe­ci­fy­ing the mini­mum accu­ra­cy of the appro­xi­ma­te solu­ti­on \(x_m\).

The solu­ti­on algo­rithm requi­res access to the objec­ti­ve func­tion \(f\) and the cons­traints \(D\). In the most gene­ral case, no glo­bal struc­tu­ral infor­ma­ti­on is known, and only spe­ci­fic func­tion values \(f(x_1), …, f(x_m)\) can be pro­vi­ded. With this spar­se infor­ma­ti­on, achie­ving an appro­xi­ma­te solu­ti­on \(x_m\) with \(f(x_m)-f(x^*)\le \epsilon\) requi­res at least [1, p. 11]

$$m \ge \left(\frac{L}{2 \epsilon} ‑1 \right)^n $$

optimi­zation steps, whe­re \(n\) is the num­ber of dimen­si­ons of the optimi­zation varia­ble \(x\) and \(L\) is an upper bound for the deri­va­ti­ve of \(f\). This equa­ti­on shows that fin­ding the opti­mal \(x^*\) by sys­te­ma­ti­cal­ly try­ing dif­fe­rent values of \(x\) is a futi­le endea­vor: even for an \(x\) with 10 com­pon­ents, a stan­dard com­pu­ter would need seve­ral mil­li­on years, and each addi­tio­nal dimen­si­on mul­ti­pli­es this time dura­ti­on by a hundred­fold [1, p. 12].

Simple derivative-based approaches

If both the func­tion value \(f\) at the points \(x_1, x_2, …\), as well as the vec­tor \(\nabla f\) of first deri­va­ti­ves (gra­di­ent) and the matrix \(\Delta f\) of second deri­va­ti­ves (Hes­si­an matrix) are available, the num­ber of optimi­zation steps requi­red is signi­fi­cant­ly redu­ced. Gra­di­ent-based methods

$$ x_{k+1} = x_k — \alpha \nabla f(x_k)~~~~~~k=0, 1, … ~~~~~\alpha >0 \text{ step size}$$

or the New­ton method

$$ x_{k+1}= x_k — \Delta f^{-1}\nabla f ~~~~~~k=0, 1, …$$

typi­cal­ly have line­ar or qua­dra­tic con­ver­gence rates [1, p. 37]. This means that with each optimi­zation step, the num­ber of cor­rect deci­mal places of \(x_k\) increa­ses by a con­stant amount or dou­bles. The abo­ve­men­tio­ned pro­blem, which is only sol­va­ble inves­t­ing mil­li­ons of years of com­pu­ta­ti­on time  purely based on func­tion values \(f(x_1), f(x_2), …\), is sol­ved in a few mil­li­se­conds with the New­ton method.

Figu­re 1: Nume­ri­cal solu­ti­on of the pro­blems \(\min_x x^2: x \in [-1,1]\) and \(\min_{x,y} x^2+y^2: [x,y]\in [0,1]^2\) using three methods. The num­ber of requi­red optimi­zation steps decrea­ses with the available information.

Conic optimization

The more is known about the func­tion \(f\) to be mini­mi­zed, the fas­ter and more relia­ble the optimi­zation works. It has been shown that infor­ma­ti­on is best for­mu­la­ted in the form of cone con­di­ti­ons, so that all non­linea­ri­ties are embedded in the cons­traint \(x \in \mathcal{C}\), whe­re \(\mathcal{C}\) is a con­vex cone. The optimi­zation problem

\begin{align} \min_x & \langle x, c \rangle \\ \text{s.t.}& Ax = b \\ & x \in \mathcal{C} \end{align}

is cal­led a cone pro­gram, and it has been pro­ven that every con­vex optimi­zation pro­blem can be writ­ten in this man­ner [2, p. 60].

Figu­re 2: Three dif­fe­rent cones that com­mon­ly occur in optimi­zation. The non-nega­ti­ve ort­hant, the second-order cone, and the cone of posi­ti­ve semi­de­fi­ni­te matrices.

Solution procedures

The class of cone pro­grams is extre­me­ly expres­si­ve. Ano­ther advan­ta­ge is that many cone pro­grams (inclu­ding LP, QP, QCQP, SOCP, SDP) can be effi­ci­ent­ly solved.

This is done by repla­cing the cons­traints with log­arith­mic bar­ri­er func­tions, which tend towards infi­ni­ty at the boun­da­ry of the fea­si­ble set; a sequence of uncons­trai­ned pro­blems con­ver­ging to the ori­gi­nal pro­blem with cons­traints is sol­ved. The num­ber of com­pu­ta­tio­nal steps in the resul­ting methods depends poly­no­mi­al­ly on the dimen­si­on \(n\) of the pro­blem and no lon­ger expo­nen­ti­al­ly [2, pp. 288–297].

To gene­ra­te cer­ti­fia­bly cor­rect solu­ti­ons, the ori­gi­nal and dual pro­blems are sol­ved simul­ta­neous­ly by sol­ving the pri­mal-dual pair of optimi­zation problems

\begin{align}
P ~~~ \min_X & \langle C,X\rangle_F \hspace{5cm} D ~~~ &\max_{y,S} & \langle b,y\rangle \\
\text{s.t.}& \langle A_k^T,X\rangle_F = b_k, ~~~ k=1, …, m_1 &\text{s.t.}& \sum_{k=1}^{m_1} y_k A_k + S = C \\
& X \succeq 0 && S \succeq 0
\end{align}

at the same time. Here, \(\langle C,X\rangle_F= \sum_{k=1}^n\sum_{l=1}^n C_{kl}X_{kl}\). The mini­miza­ti­on of the dua­li­ty gap \(\langle C,X \rangle_F — \langle b,y\rangle=\langle X,S\rangle_F\) then takes place using the log­arith­mic bar­ri­er func­tion \(-(\log \det X + \log \det S)=-\log \det (XS)\) that tends towards infi­ni­ty at the boun­da­ry of the posi­ti­ve semi­de­fi­ni­te cones ther­eby lea­ding to the optimi­zation pro­blem [3, p. 13]:

\begin{align}
\min_{X,y,S} ~~~&\langle X, S\rangle_F — \mu \log \det(XS) \\
\text{s.t.} ~~~ & \langle A_k^T, X\rangle_F = b_k ~~~~~ k=1, …, m_1 \\
& \sum_{k=1}^{m_1} y_k A_k + S = C
\end{align}

The resul­ting pro­blem must be sol­ved for a decre­asing sequence of values for \(\mu\) to ensu­re con­ver­gence to the ori­gi­nal pro­ble­m’s solution.

Algorithms

The spe­ci­fic algo­rithm pro­ceeds from fea­si­ble pri­mal and dual varia­bles \(X^0, S^0, y^0\) through the fol­lo­wing steps until con­ver­gence [3, p. 118]:

\[
\begin{align}
X_{k+1} &= X_k + \Delta X, ~~~~~~ \Delta X = \mu_k S_k^{-1} — X_k — D\Delta S D \\
S_{k+1} &= S_k + \Delta S, ~~~~~~ \Delta S = \sum_{k=1}^{m_1} \Delta y_k A_k \\
y_{k+1} &= y_k + \Delta y, ~~~~~~ \Delta y = Q^{-1} g \\
\mu_{k+1} &= \alpha \mu_k, ~~~~~~ 0 < \alpha < 1
\end{align}
\]

Whe­re \((Q)_{kl} = \langle D^T A_k^T, A_l D \rangle_F\), \(g = \mu \langle A_k^T, S_k^{-1} \rangle_F — b_k\), and \(D = S_k^{-1/2} (S_k^{1/2} X S_k^{1/2})^{1/2} S_k^{-1/2}\). As shown in the equa­tions, forming and inver­ting lar­ge matri­ces is a part of the com­pu­ta­ti­on pro­cess. While com­pu­ta­tio­nal­ly inten­si­ve, with modern com­pu­ta­tio­nal capa­bi­li­ties this approach is still fea­si­ble for thou­sands of varia­bles. Sin­ce LP, QP, etc., are only spe­cial ins­tances of SDP, the optimi­zation algo­rithm pre­sen­ted here is quite repre­sen­ta­ti­ve for approa­ches to other pro­blems bes­i­des just semi­de­fi­ni­te programming.

Solvers

In addi­ti­on to some com­mer­cial sol­vers, the­re are also publicly available and free open-source sol­vers. For per­for­mance reasons, they are often writ­ten in the C pro­gramming lan­guage. We would like to par­ti­cu­lar­ly high­light the mode­ling frame­work CVXPY [4] for con­vex optimi­zation pro­blems. Alt­hough it is not a sol­ver its­elf, it pro­vi­des a uni­fied inter­face to a varie­ty of sol­vers and an auto­ma­ted grammar for for­mu­la­ting con­vex problems.

Name sol­verLicenceCapa­bi­li­tiesOrga­niza­ti­on / Reference
CVXOPTGNU GPLLP, QP, SOCP, SDPStan­ford CO Group
ECOSGNU GPL‑3.0SOCPEmbo­tech GmbH
GLPKGNU GPLLP, MILPMoscow Avia­ti­on Institute
MDP­tool­boxBSD‑3DPSte­ven Cordwell
OR toolsApa­che 2.0LP, MILPGoog­le
SCIPZIB Aca­de­mic LicenseLP, MILP, MINLP, CIPZuse Insti­tut Berlin
sci­pyBSDLP, Black­boxsci­py python package
SCSMITLP, SOCP, SDP, ECP, PCPStan­ford CO Group
CPlexkom­mer­zi­ellLP, QP, non­con­vex QP, SOCP, MIQP, MISOCPIBM
Guro­bikom­mer­zi­ellLP, MILP, QP, MIQP, QCP, MIQCPGuro­bi
Mosekkom­mer­zi­ellLP, QP, SOCP, SDP, MINLPMosek

LP = Line­ar Pro­gram, QP = Qua­dra­tic pro­gram, QCP = Qua­dra­ti­cal­ly cons­train­ted pro­gram, SOCP = Second order cone pro­gram,  SDP = Semi­de­fi­ni­te pro­gram, ECP =  Expo­nen­ti­al cone pro­gram, PCP = Power cone pro­gram, CIP = Cons­traint inte­ger pro­gram, DP = Dyna­mic pro­gramming, MI = Mixed inte­ger, NLP = Non­line­ar program

Sources

[1] Nes­terov, Y. (2013). Intro­duc­to­ry Lec­tures on Con­vex Optimi­zation: A Basic Cour­se. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.

[2] Nes­terov, Y., & Nemi­rovs­kii, A. (1994). Inte­ri­or-point Poly­no­mi­al Algo­rith­ms in Con­vex Pro­gramming. Phil­adel­phia: SIAM.

[3] De Klerk, E. (2006). Aspects of Semi­de­fi­ni­te Pro­gramming: Inte­ri­or Point Algo­rith­ms and Sel­ec­ted Appli­ca­ti­ons. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.

[4] Dia­mond S., & Boyd S., (2016). CVXPY: A Python-embedded mode­ling lan­guage for con­vex optimi­zation. Jour­nal of Machi­ne Lear­ning Rese­arch, 17(83), 1–5.