Theory: Convexity

Definition

A set \(D\) is cal­led con­vex if all its ele­ments can be con­nec­ted by straight lines that lie enti­re­ly within \(D\). Spe­ci­fi­cal­ly, this means that for all \(x_1, x_2 \in D\) and \(y = \theta x_1 + (1-\theta)x_2\) with \(\theta \in [0,1]\), \(y\) is an ele­ment of \(D\).

Both \(x_1, x_2\) and \(y\) are vec­tors of arbi­tra­ry dimen­si­ons. Con­vex sets are simp­le objects in the sen­se that any point can be rea­ched from any other point in a straight­for­ward man­ner. A func­tion \(f:D \rightarrow \mathbb{R}\) is cal­led con­vex if \(D\) is con­vex and every line­ar appro­xi­ma­ti­on to \(f\) lies enti­re­ly abo­ve \(f\). For­mal­ly, this means that for all \(x_1, x_2 \in D\) and all \(\theta \in [0,1]\):

$$ \theta f(x_1) + (1-\theta)f(x_2) \ge f(\theta x_1 + (1-\theta) x_2) $$

The fact that the value of a con­vex \(f\) at any point \(x\) always lies below the avera­ge value of its neigh­bors impli­es the absence of hill-like shapes in the func­tion land­scape of \(f\) and a con­sis­t­ent­ly upward (=posi­ti­ve) cur­vat­u­re. A com­pre­hen­si­ve source on the con­ve­xi­ty of sets and func­tions is [1, pp. 23–90].

Figu­re 1: Examp­les of con­vex sets and con­vex func­tions. Note that a func­tion with two sepa­ra­te local mini­ma is not convex.

Relevance

Con­vex sets and con­vex func­tions ensu­re uni­que and prac­ti­cal­ly fea­si­ble mini­miza­bi­li­ty. This is signi­fi­cant, as optimi­zation pro­blems are typi­cal­ly unsol­va­ble [2, p.4]. As shown in the figu­re abo­ve, the pre­sence of mul­ti­ple mini­ma impli­es that a func­tion is not con­vex. Con­ver­se­ly, a con­vex \(f\) has only one local minimum.

Addi­tio­nal­ly, due to the con­ve­xi­ty of \(f\), glo­bal pro­per­ties can be infer­red from local cha­rac­te­ristics. Assum­ing \(f\) is dif­fe­ren­tia­ble, the fol­lo­wing state­ments hold:

\(f(x_2) \ge f(x_1) + \nabla f(x_1)^T[x_2-x_1]\)
\(\Delta f(x)\succeq 0\)

for all \(x_1\) and \(x_2\), whe­re \(\nabla f\) is the gra­di­ent (vec­tor of first deri­va­ti­ves) and \( \Delta f\) is the Hes­si­an matrix (matrix of second deri­va­ti­ves) of \(f\). A con­vex dif­fe­ren­tia­ble func­tion is always non-nega­tively cur­ved and is boun­ded below by its line­ar appro­xi­ma­ti­on. Con­se­quent­ly, the glo­bal mini­mum is at the point \(x^*\), whe­re \(\nabla f(x)=0\). This fol­lows from \(f(x_1)\ge f(x^*) + \underbrace{\nabla f(x^*)[x^*-x_2]}_{=0} \ge f(x^*)\).

Figu­re 2: Illus­tra­ti­on of State­ments Regar­ding Glo­bal Lower Bounds Through Line­ar Appro­xi­ma­ti­on (a) and Posi­ti­ve Cur­vat­u­re Lea­ding to Incre­asing Gra­di­ent Values (b). Figu­re © Shows a Poten­ti­al Solu­ti­on Approach Through Algo­rith­mic Search for Zeros of \(\nabla f\).

Solution approach

The search for zeros of \(\nabla f\) leads direct­ly to the method illus­tra­ted on the right in the abo­ve figu­re: Start at any point \(x_0\), use the value \(\nabla f(x_0)\) of the gra­di­ent and its rate of chan­ge \(\Delta f (x_0)\) to esti­ma­te the zero. With the esti­ma­tor \(x_1\) for the zero, repeat the pro­ce­du­re until the actu­al zero \(x^*\) is found.

From \(\nabla f(x_0) + \Delta f[x_1 — x_0] \stackrel{!}{=} 0\), the sequence of esti­ma­tors deri­ved from \(x_0\) is given by $$ x_{k+1} = x_k — \Delta f^{-1}(x_k) \nabla f(x_k) ~~~~ k=0,1, … $$

This method is also known as the New­ton method. Due to posi­ti­ve cur­vat­u­re, each point \(x_{k+1}\) has a lower func­tion value \(f(x_{k+1})\) than its pre­de­ces­sor \(x_k\) and the method ter­mi­na­tes at the opti­mum \(x^*\) [1, p. 484]. It can also be modi­fied for optimi­zation pro­blems with cons­traints. More on this here.

Example

Many important func­tions are con­vex and can the­r­e­fo­re be effi­ci­ent­ly opti­mi­zed. The examp­les in the fol­lo­wing non-exclu­si­ve list are taken from [1, pp. 71–73].

NameFunc­tionDomainAppli­ca­ti­on in modelling
Expo­nen­ti­al function\(e^{ax}\)\(x\in \mathbb{R}, a \in \mathbb{R}\)Growth pro­ces­ses and fail­ure probabilities
Neg. log­arithm\(-\log x\)\(x > 0\)Pro­ba­bi­li­ty distributions
Line­ar functional\(c^T x\)\(x\in \mathbb{R}^n, c \in \mathbb{R}^n\)Com­bi­na­ti­on of effects
Norm\(\|x\|_p\)\(x \in \mathbb{R}^n, p \ge 1\)Dis­crepan­ci­es and distances
Neg. entro­py\(-\sum_{k=1}^n p(x_k)\log p(x_k) \)\(p(x_k)> 0\)Com­pa­ri­son pro­ba­bi­li­ty distributions
Neg. geom. mean\( — \left(\Pi_{k=1}^n x_k\right)^{1/n}\)\( x\in \mathbb{R}^n, x_k >0\)Growth pro­ces­ses and indi­ca­tor functions
Maxi­mum\(\max (x_1, …, x_n)\)\( x\in \mathbb{R}^n\)Unpro­por­tio­nal impacts
Soft­max\(\log \left( \sum_{k=1}^n e^{x_k}\right)\)\( x\in \mathbb{R}^n\)Machi­ne learning
Neg. log­det\(- \log \det X\)\(X\) posi­tiv definiteCova­ri­ance matri­ces and energies
Sum of eigenvalues\( \sum_{k=1}^n \lambda_k(X)\)\(X\) posi­tiv semidefiniteUncer­tain­ties and dis­tri­bu­tio­nal properties
Matrix frac­tion­al\(x^YP^{-1}x\)\( x\in \mathbb{R}^n, P\) posi­tiv semidefiniteNor­mal dis­tri­bu­ti­ons and aniso­tro­pic distances

Construction rules

The abo­ve examp­les can be com­bi­ned to gene­ra­te new, more expres­si­ve con­vex func­tions. Posi­tively weigh­ted line­ar com­bi­na­ti­ons \(w_1f_1(x)+w_2f_2(x)\) of con­vex func­tions \(f_1(x), f_2(x)\) are con­vex; the same appli­es to the maxi­mum \(\max( f_1(x),f_2(x))\) or the com­po­si­ti­on with a line­ar trans­for­ma­ti­on to \(f_1(Ax+b)\). Modern mode­ling frame­works like CVXPY [3] levera­ge the­se rela­ti­onships. Start­ing from demons­tra­b­ly con­vex exam­p­le func­tions and con­ve­xi­ty-pre­ser­ving com­po­si­ti­on rules, even see­mingly com­plex func­tions can often be redu­ced to com­bi­na­ti­ons of ele­men­ta­ry con­vex func­tions and then sol­ved [4].

Sources

[1] Boyd, S., & Van­den­berg­he, L. (2004). Con­vex Optimi­zation. Cam­bridge: Cam­bridge Uni­ver­si­ty Press.

[2] Nes­terov, Y. (2013). Intro­duc­to­ry Lec­tures on Con­vex Optimi­zation: A Basic Cour­se. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.

[3] Dia­mond S., & Boyd S., (2016). CVXPY: A Python-embedded mode­ling lan­guage for con­vex optimi­zation. Jour­nal of Machi­ne Lear­ning Rese­arch, 17(83), 1–5.