Optimal estimation : Estimation of failure probabilities

Problem statement

When pro­duc­tion, for­ma­ti­on, or mea­su­re­ment pro­ces­ses are cha­rac­te­ri­zed by uncer­tain­ties with a known pro­ba­bi­li­ty dis­tri­bu­ti­on, the pro­ba­bi­li­ties \(P(X \in C)\) can be direct­ly cal­cu­la­ted. Here, \(P(X \in C)\) descri­bes the pro­ba­bi­li­ty that a pro­duct, quan­ti­ty, or mea­su­re­ment \(X\) takes values within the ran­ge \(C \subset \mathbb{R}\).

Ana­ly­zing such pro­ba­bi­li­ties is an important dia­gno­stic tool for sys­te­ma­ti­cal­ly asses­sing decis­i­ons in uncer­tain situa­tions when the impact of wrong decis­i­ons needs to be boun­ded even though some cri­ti­cal rele­vant infor­ma­ti­on is unknown or sub­ject to randomness.

Problem illustration

For exam­p­le, if some machi­ne com­pon­ents are manu­fac­tu­red who­se lengths must lie within the inter­val \([a, b]\) for safe use, then the varia­ti­ons in com­po­nent length cau­sed by rand­om­ness in the manu­fac­tu­ring pro­cess pose a pro­blem. It is neces­sa­ry to exami­ne the pro­ba­bi­li­ties that:

  1. A com­po­nent is manu­fac­tu­red with an incor­rect length \(X \in C, C = \mathbb{R} \setminus [a, b]\)
  2. Due to mea­su­re­ment uncer­tain­ty, a cor­rect­ly manu­fac­tu­red com­po­nent is declared faul­ty (Type‑I error)
  3. Due to mea­su­re­ment uncer­tain­ties, a faul­ty manu­fac­tu­red com­po­nent is declared cor­rect (Type-II error).
Figu­re 1: The area of the shaded regi­ons quan­ti­fies the pro­ba­bi­li­ty of the occur­rence of unde­si­ra­ble events. The­se include the pro­duc­tion of defec­ti­ve com­pon­ents (a), the type‑I error, and the type-II error (b).

If \(X\) is nor­mal­ly dis­tri­bu­ted with mean \(\mu\) and stan­dard devia­ti­on \(\sigma\), then the pro­ba­bi­li­ty \(P(X \in C)\) can be cal­cu­la­ted as

$$P(X \le a) + P(X \ge b) = \int_{-\infty}^a p(x) \, dx + \int_b^{\infty} p(x) \, dx = 1 — \int_a^b p(x) \, dx = 1 — (\Phi(b) — \Phi(a)).$$

Here, \(\Phi(x)\) is the cumu­la­ti­ve dis­tri­bu­ti­on func­tion of \(X\) and can be found in mathe­ma­ti­cal refe­rence books. Thanks to the one-dimen­sio­na­li­ty of this exem­pla­ry error esti­ma­ti­on pro­blem and the pre­cise know­ledge of the pro­ba­bi­li­ty dis­tri­bu­ti­on, the pro­blem can thus be sol­ved with simp­le methods.

Chebychev inequality: Idea

In real appli­ca­ti­ons, pro­ba­bi­li­ty dis­tri­bu­ti­ons are typi­cal­ly both high-dimen­sio­nal and unknown, descri­bed only by descrip­ti­ve sta­tis­ti­cal metrics such as empi­ri­cal means and variances.

Nevert­hel­ess, even in the­se cases, worst-case pro­ba­bi­li­ties \(P(X \in C)\) can be cal­cu­la­ted by mini­mi­zing over pro­ba­bi­li­ty dis­tri­bu­ti­ons con­sis­tent with the obser­va­tions \(E[X]=\mu \in \mathbb{R}^n\) and \(E[XX^T]=\Sigma \in \mathbb{R}^{n \times n}\). This allows for quan­ti­fi­ca­ti­on of the pro­ba­bi­li­ty that com­pon­ents do not meet qua­li­ty requi­re­ments, natu­ral­ly occur­ring objects exhi­bit aty­pi­cal cha­rac­te­ristics, or mea­su­re­ments pro­du­ce gross­ly incor­rect observations.

Chebychev inequality: Optimization formulation

The pro­blem can be for­mu­la­ted as [1, p. 377]

$$ \begin{align} \min_{P,q,r} ~~~& \operatorname{tr} (\Sigma P) + 2 \mu^Tq+r \\ \text{s.t.} ~~~&x^TPx+q^Tx+r \ge 0 ~~\forall x \in \mathbb{R}^n \\ ~~~& x^TPx+q^Tx+r‑1 \ge 0 ~~ \forall x \in C \end{align}$$

whe­re \(\Sigma\) and \(\mu\) are the pro­blem data and the two ine­qua­li­ties ensu­re that the value of the optimi­zation pro­blem is an upper bound for \(P(X \in C)\). If \(C\) is the exte­ri­or of a poly­he­dron \(M\) defi­ned by the \(n_c\) ine­qua­li­ties \(a_k^Tz \ge b_k, k=1, …, n_c\), then the infi­ni­te­ly many ine­qua­li­ties in the optimi­zation pro­blem abo­ve can be trans­for­med into a requi­re­ment for the semi­de­fi­ni­ten­ess of matri­ces. The equi­va­lent semi­de­fi­ni­te pro­gram is then [1, p. 377]

$$ \begin{align} \min_{P,q,r, \lambda} ~~~& \operatorname{tr} (\Sigma P) + 2 \mu^Tq+r \\ \text{s.t.} ~~~&\begin{bmatrix}P & q\\ q^T & r\end{bmatrix} \succeq 0 \\ ~~~& \begin{bmatrix} P & q \\ q^T & r‑1\end{bmatrix} \succeq \lambda_k \begin{bmatrix} 0 & a_k/2 \\ a_k^T/2 & ‑b_k \end{bmatrix} ~~~~ k=1, …, n_c \\ ~~~& \lambda_k \ge 0 ~~~~k=1, .…, n_c \end{align}$$

Chebychev inequality : Application

Das nach­fol­gen­de Bild illus­triert eine bei­spiel­haf­te Lösung des fol­gen­den Pro­b­le­mes. Eine Maschi­nen­kom­po­nen­te mit zufäl­li­gen Län­gen­cha­rak­te­ris­ti­ken \(X_1, …, X_{10}\) wer­de gefer­tigt. Die Erwar­tungs­wer­te und Kova­ri­an­zen der Län­gen­cha­ra­ket­e­ris­ti­ken sind bekannt; z.B. durch Mes­sung einer Test­men­ge von Kom­po­nen­ten. Eine Kom­po­nen­te ist nur dann ver­wend­bar zum Ein­bau in die über­ge­ord­ne­te Maschi­ne, wenn sie die 20 Bedingungen

$ a_k^T x \le b_k ~~~~ k=1, …, 20$

an die Län­gen erfüllt. Die­ses Pro­blem ist leicht lös­bar mit CVXPY [2].

Figu­re 2: Exam­p­le of some of the con­di­ti­ons on the pro­per­ty pair \(x_1, x_2\) of the machi­ne com­po­nent (a) and an exem­pla­ry com­po­nent with 10 length cha­rac­te­ristics (b)..

The fol­lo­wing images show the opti­mal matri­ces \(P\), vec­tors \(q\), and scalars \(r\) such that \(\operatorname{tr}(\Sigma P) + 2 \mu^Tq+r \) is the smal­lest upper bound for \(P(X\in C)\). With the \(P, q, r, \lambda\) shown the­re, \(P(X\in C)\le 0.4\). More infor­ma­ti­on on this so-cal­led moment pro­blem can be found in [3, pp. 469–509].

Figu­re 3: The solu­ti­on to the semi­de­fi­ni­te optimi­zation pro­blem (a) and the inter­pre­ta­ti­on of the solu­ti­on (b).

Integration of additional information

If more is known about the under­ly­ing pro­ba­bi­li­ty dis­tri­bu­ti­on than just the expec­ted values \(E[X]\) and \(E[XX^T]\), this infor­ma­ti­on can be used to impro­ve error esti­ma­ti­on and sol­ve addi­tio­nal pro­blems. Optimi­zation over pro­ba­bi­li­ty dis­tri­bu­ti­ons can be enhan­ced through know­ledge of:

  • Moment bounds \(M_k = E[X^k]\), for \(k = 1, \ldots, n\): Sin­ce the Han­kel matrix $$\begin{bmatrix} M_0 & M_1 & M_2 & \cdots \\ M_1 & M_2 & \cdots & \\ M_2 & \vdots & \ddots & \\ \vdots & & & \end{bmatrix} \succeq 0$$  of the moments \(M_k\) of a pro­ba­bi­li­ty dis­tri­bu­ti­on must be posi­ti­ve semi­de­fi­ni­te, an optimi­zation over moments can be writ­ten as an SDP [1, p. 171].

  • Mar­gi­nal pro­ba­bi­li­ty dis­tri­bu­ti­ons \(\bar{p} = \{\bar{p}_1, \ldots, \bar{p}_n\}\) and \(\bar{q} = \{\bar{q}_1, \ldots, \bar{q}_m\}\) of the ran­dom varia­bles \(X_1\) and \(X_2\): The mul­ti­va­ria­te dis­tri­bu­ti­on is defi­ned by the matrix $$P=\begin{bmatrix} p_{11} & \cdots & p_{1n}\\ \vdots & \ddots & \vdots\\ p_{m1} & \cdots & p_{mn}\end{bmatrix}$$  of pro­ba­bi­li­ties \(p_{ij} = P(X_1 = x_i, X_2 = x_j) \ge 0\), allo­wing opti­miza­ti­ons over mul­ti­va­ria­te pro­ba­bi­li­ty dis­tri­bu­ti­ons to be writ­ten as LPs with cons­traints \(\sum_{j=1}^n p_{ij} = \bar{q}_i\) and \(\sum_{i=1}^m p_{ij} = \bar{p}_j\).
  • Cumu­lant-gene­ra­ting func­tion \(\log E\left[ \exp(\lambda^T X)\right]\) of the pro­ba­bi­li­ty dis­tri­bu­ti­on of \(X\)**: Depen­ding on the pre­cise form of this func­tion, the Cher­n­off bound [1, p. 380] $$ \begin{align}P(X\in C) \le e^q ~~~~&~~~~ C=\{x: Ax \le b\} \\q= \min_{\lambda u} ~~~& b^Tu+\log E[\exp(\lambda^T X)] \\ \text{s.t.} ~~~&u \ge 0 ~~~ A^Tu+\lambda=0 \end{align}$$ results in an LP, QP, SDP, or a con­vex optimi­zation pro­blem not sol­va­ble by con­ven­tio­nal methods to cons­train the pro­ba­bi­li­ty of an even­t’s occurrence.

Applications and practical aspects

Error esti­ma­ti­ons of the type descri­bed abo­ve are par­ti­cu­lar­ly hel­pful in risk limi­ta­ti­on. When state­ments must be made about quan­ti­ties only known by their pro­ba­bi­li­ty dis­tri­bu­ti­on or uncer­tain mea­su­re­ments need to be trans­la­ted into defi­ni­ti­ve decis­i­ons, the­se methods can pro­vi­de real added value.

The risk limits depend not on the pre­cise pro­ba­bi­li­ty dis­tri­bu­ti­on but on the over­all con­fi­gu­ra­ti­on of the pro­blem. This rela­ti­onship can be used to find a pro­blem con­fi­gu­ra­ti­on that redu­ces the risk limits to an accep­ta­ble level.

Figu­re 5: The choice of limits \(c_1, c_2\) for eva­lua­ting com­pon­ents, for exam­p­le, based on uncer­tain mea­su­re­ments, deter­mi­nes the pro­ba­bi­li­ties for type‑I and type-II errors. They must be cho­sen so that both errors remain within accep­ta­ble limits (a), (b). The 6 \(\sigma\) method requi­res spe­ci­fic rela­ti­onships bet­ween manu­fac­tu­ring accu­ra­cy and mea­su­re­ment inaccuracies ©.

The estab­lished 6 \(\sigma\) cri­ter­ion for defi­ning sta­tis­ti­cal qua­li­ty tar­gets in pro­duc­tion manage­ment is simi­lar to the abo­ve methods, albeit mathe­ma­ti­cal­ly less invol­ved. The gain obtai­ned through hig­her com­ple­xi­ty and increased sto­cha­stic mode­ling effort results in shar­per, less con­ser­va­ti­ve pro­ba­bi­li­ty estimates.

Code & Sources

Exam­p­le code: OE_bounding_probabilities.py in our tuto­ri­al­fol­der

[1] Boyd, S., & Van­den­berg­he, L. (2004). Con­vex Optimi­zation. Cam­bridge: Cam­bridge Uni­ver­si­ty Press.

[2] Dia­mond S., & Boyd S., (2016). CVXPY: A Python-embedded mode­ling lan­guage for con­vex optimi­zation. Jour­nal of Machi­ne Lear­ning Rese­arch, 17(83), 1–5.

[3] Wol­ko­wicz, H., Sai­gal, R., & Van­den­berg­he, L. (2012). Hand­book of Semi­de­fi­ni­te Pro­gramming: Theo­ry, Algo­rith­ms, and Appli­ca­ti­ons. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.