Optimal design: Experiment design

Problem statement

The ques­ti­on of the opti­mal design of an expe­ri­ment or a mea­su­re­ment set­up ari­ses whe­re­ver the coll­ec­tion of mea­su­re­ments is cos­t­ly and fraught with uncer­tain­ties. This con­cerns any appli­ca­ti­ons whe­re data ana­ly­sis must be per­for­med using sta­tis­ti­cal methods.

This includes the design of phar­maceu­ti­cal tri­al series [1, p. 511], school exams [2], or mea­su­re­ment net­works in geo­de­sy [3]. In expe­ri­men­tal design, the goal is to deter­mi­ne how often and which mea­su­re­ments should be con­duc­ted to maxi­mi­ze the infor­ma­ti­on con­tent of the data.

Example: Measurement distribution

A line is to be drawn through data points mea­su­red at two posi­ti­ons. This task alo­ne would be a tri­vi­al one from the field of para­me­ter esti­ma­ti­on. Howe­ver, we are now deal­ing with the ques­ti­on of how a mea­su­re­ment bud­get of, for exam­p­le, \(N=10\) noi­sy mea­su­re­ments can be allo­ca­ted to the­se posi­ti­ons in such a way that the over­all uncer­tain­ty of the esti­ma­ted line is mini­mi­zed. This is pre­cis­e­ly an expe­ri­ment design problem.

Figu­re 1: Sche­ma­tic sketch of the expe­ri­men­tal design (a). Mea­su­re­ments with uncer­tain­ty can be taken at two posi­ti­ons \(p_1\) and \(p_2\). It must be deci­ded how many mea­su­re­ments at \(p_1\) and \(p_2\) are to be con­duc­ted in order to deter­mi­ne the deri­ved para­me­ters \(\theta_1, \theta_2\) as accu­ra­te­ly as pos­si­ble (b).

The rela­ti­onship bet­ween mea­su­re­ments \(l_1\), \(l_2\) and posi­ti­ons \(p_1, p_2\) is given by the equa­ti­on of a line as

$$ \underbrace{\begin{bmatrix} l_1 \\ l_2 \end{bmatrix}}_{l} \approx \underbrace{\begin{bmatrix} 1 & p_1 \\ 1 & p_2 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} \theta_1 \\ \theta_2 \end{bmatrix}}_{\theta} ~~~~~\begin{matrix} \theta_1 : \text{ Inter­cept} \\ \theta_2 : \text{ Slo­pe} \end{matrix}.$$

\(n_1\) mea­su­re­ments can be taken at \(p_1\) and \(n_2\) mea­su­re­ments at \(p_2\), whe­re \(n_1 + n_2 = N = 10\) must hold.

Example: Optimization formulation

This mea­su­re­ment dis­tri­bu­ti­on is to be cho­sen such that the total uncer­tain­ty \(\sigma_{\theta_1}^2 + \sigma_{\theta_2}^2\) of the para­me­ters is mini­mi­zed, whe­re \(\sigma_{\theta}^2\) is the vari­ance of the para­me­ter \(\theta\). The optimi­zation pro­blem is

$$\begin{align} \min_{n_1,n_2} ~~~ & \sigma_{\theta_1}^2(n_1,n_2) + \sigma_{\theta_2}^2 (n_1, n_2) \\ \text{s.t.} ~~~ & n_1+n_2=10 \end{align}$$

For sim­pli­ci­ty, let’s assu­me that the mea­su­re­ments are equal­ly accu­ra­te with vari­ance \(\sigma_0^2\) and choo­se the spe­ci­fic mea­su­re­ment posi­ti­ons \(p_1=0\) and \(p_2=1\). Accor­ding to the usu­al rules for uncer­tain­ties, an \(n-\) fold mea­su­re­ment redu­ces the vari­ance to \(1/n-\) of the sin­gle mea­su­re­ment vari­ance [4, p. 17]. Then, for the cova­ri­ance matri­ces \(\Sigma_l, \Sigma_{\theta}\) of the obser­va­tions and the parameters,

$$\begin{align} \Sigma_l & = \sigma_0^2 \begin{bmatrix} \frac{1}{n_1} & 0 \\ 0 & \frac{1}{n_2} \end{bmatrix} \\ \Sigma_{\theta} & =A^{-1} \Sigma_l A^{-1 T} \\ & = \sigma_0^2 \begin{bmatrix} 1 & 0 \\ ‑1 &1 \end{bmatrix} \begin{bmatrix} \frac{1}{n_1} & 0 \\ 0 & \frac{1}{n_2} \end{bmatrix} \begin{bmatrix} 1 & ‑1 \\ 0 & 1 \end{bmatrix} \\ & = \sigma_0^2 \begin{bmatrix} \frac{1}{n_1} & — \frac{1}{n_1} \\ -\frac{1}{n_1} & \frac{1}{n_1}+\frac{1}{n_2} \end{bmatrix} \end{align} $$

Mini­mi­zing \(\sigma_{\theta_1}^2 + \sigma_{\theta_2}^2\) is equi­va­lent to mini­mi­zing \(\operatorname{tr} \Sigma_{\theta} \) and thus, the opti­mal expe­ri­men­tal design is cha­rac­te­ri­zed by

$$\begin{align} \min_{n_1,n_2} ~~~& \frac{2}{n_1}+\frac{1}{n_2} \\ \text{s.t.} ~~~ & n_1+n_2=10 \end{align}$$

Example: Solution

Sur­pri­sin­gly, the mea­su­re­ment at loca­ti­on \(p_1\) is twice as important for the over­all uncer­tain­ty as the mea­su­re­ment at loca­ti­on \(p_2\). This is becau­se the mea­su­re­ment \(l_2\) is irrele­vant for the para­me­ter \(\theta_1\) — a rela­ti­onship that would not have been con­side­red in an intui­ti­ve equal dis­tri­bu­ti­on of measurements.

The cons­trai­ned optimi­zation pro­blem can be sol­ved using the Lagran­ge func­tion \(\mathcal{L}\) and the asso­cia­ted KKT equations.

$$ \begin{align} \mathcal{L}(n_1,n_2,\nu)&= \frac{2}{n_1}+\frac{1}{n_2}+\nu(n_1+n_2-10) \\ \partial_{n_1} \mathcal{L} & = -\frac{2}{n_1^2}+\nu\stackrel{!}{=}0 \\\partial_{n_2} \mathcal{L} &=-\frac{1}{n_2^2}+\nu \stackrel{!}{=}0 \\ \partial_{\nu} \mathcal{L} & = n_1+n_2-10 \stackrel{!}{=} 0 \end{align}$$

The opti­mal solu­ti­on is \(n_1^*=(1+\sqrt{2})^{-1}(\sqrt{2} +10)\approx 5.9 \) and \(n_2^*=(1+\sqrt{2})^{-1}10 \approx 4.1\). The mini­mal uncer­tain­ty of the para­me­ters is \(0.583*\sigma_0^2\), whe­re \(\sigma_0^2\) is the vari­ance of a sin­gle measurement.

Figu­re 2: The indi­vi­du­al uncer­tain­ties of the para­me­ters \(\theta_1\) and \(\theta_2\) are shown in (a) and (b), respec­tively. The num­ber \(n_1\) of mea­su­re­ments at point \(p_1\) has the grea­test influence on the para­me­ter \(\theta_1\), and the same appli­es to \(n_2\), \(p_2\), and \(\theta_2\). For \(n_1+n_2=10\), the total uncer­tain­ty is mini­mi­zed when 6 mea­su­re­ments are con­duc­ted at \(p_1\) and 4 mea­su­re­ments at \(p_2\) ©.

Generalization

Various exten­si­ons of the pro­blem are pos­si­ble. Dif­fe­rent cos­ts can be asso­cia­ted with dif­fe­rent mea­su­re­ments, and mea­su­re­ments can be inter­lin­ked. It is also pos­si­ble to group mea­su­re­ments tog­e­ther and only allow decis­i­ons about the expe­ri­men­tal design at the level of the­se groups. Typi­cal­ly, the optimi­zation pro­blem for deter­mi­ning the best expe­ri­men­tal designs is for­mu­la­ted using the Fisher Infor­ma­ti­on Matrix \(\mathcal{M}\) as fol­lows [1, p. 514]:

$$\begin{align} \min_{\xi} ~~~ & \Psi[\mathcal{M}] \\ \text{s.t.} ~~~& \xi \in \Xi \\ & \xi = \begin{bmatrix} p_1 & \cdots & p_n \\ x_1 & \cdots & x_n \end{bmatrix} \end{align} .$$

Here, \(\xi\) is the design of the expe­ri­ment, which deter­mi­nes the mea­su­re­ment points \(p_1, …, p_n\) and the rela­ti­ve fre­quen­ci­es \(x_1, …, x_n\) of the mea­su­re­ments. It is cons­trai­ned to the set \(\Xi\), and the infor­ma­ti­on matrix \(\mathcal{M}\) resul­ting from the design \(\xi\) is eva­lua­ted regar­ding its sui­ta­bi­li­ty through the func­tion \(\Psi\). \(\Psi(\mathcal{M}\) usual­ly direct­ly rela­tes to the cova­ri­ance matrix \(\Sigma_{\theta}\) of the para­me­ters deri­ved from the expe­ri­ment and can repre­sent the total uncer­tain­ty \(\operatorname{tr} \Sigma_{\theta}\), the maxi­mum uncer­tain­ty \(\lambda_{max} (\Sigma_{\theta})\), or the volu­me of uncer­tain­ty \(\log \det \Sigma_{\theta}\) [1, pp. 511–532].

The abo­ve optimi­zation pro­blem gene­ra­li­zes the initi­al­ly manu­al­ly sol­ved task con­cer­ning the opti­mal dis­tri­bu­ti­on of mea­su­re­ments for deter­mi­ning the para­me­ters of a fit­ting line. In it, \(n=2\), \(p_1\), and \(p_2\) were fixed, and \(x_1=n_1, x_2=n_2\) were to be cho­sen such that \(\Psi(\mathcal{M})=\operatorname{tr} \Sigma_{\theta} =\sigma_{\theta_1}^2+\sigma_{\theta_2}^2\) is mini­mi­zed. The rela­ti­onship bet­ween \(\Sigma_{\theta}\) and \(l \approx A \theta\) was \(\Sigma_{\theta} = A^{-1}\Sigma_l A^{-1 T}\), thus $$ \Sigma_{\theta} = A^{-1} \begin{bmatrix} \frac{1}{n_1} & 0 \\ 0 & \frac{1}{n_2} \end{bmatrix} A^{-1 T} = A^{-1} \begin{bmatrix} n_1 & 0 \\ 0 & n_2\end{bmatrix}^{-1} A^{-1 T} = \left[ \sum_{k=1}^2 n_k a_k a_k^T\right]^{-1} $$

whe­re \(a_1, a_2\) are the rows of the matrix \(A\).

Optimization formulation total uncertainty

Prac­ti­cal­ly, the total uncer­tain­ty mini­miza­ti­on car­ri­ed out by us in the simp­le 2‑dimensional case gene­ra­li­zes as a semi­de­fi­ni­te pro­gram in the fol­lo­wing form [5, p. 387]:

$$ \begin{align} \min_x ~~~& \operatorname{tr}\left[\sum_{k=1}^n x_k a_k a_k^T\right]^{-1} \\ \text{s.t.} ~~~& x \ge 0 ~~~ \vec{1}^Tx=1 \end{align}$$

Here, \(C=[\sum_{k=1}^n x_k a_k a_k^T]^{-1}\) is a cova­ri­ance matrix. Mini­mi­zing the exten­si­on of \(C\) through the demand \(\min _x \operatorname{tr} C\) impli­es the least pos­si­ble uncer­tain­ties. In the optimi­zation for­mu­la­ti­on, the ent­ries \(x_k, k=1, …, n\) of the vec­tor \(x\) are the decis­i­on varia­bles. Their values indi­ca­te how lar­ge the share of expe­ri­ment No. \(k\) is in the enti­re expe­ri­ment bud­get. The­r­e­fo­re, the \(x_k\) must also be non-nega­ti­ve and sum up to 1. The \(a_k, k=1, …, n\) repre­sent \(n\) dif­fe­rent poten­ti­al measurements.

The pro­blem can be trans­for­med into a semi­de­fi­ni­te pro­gram in stan­dard form using the Schur com­ple­ment. In the form

$$ \begin{align} \min_{x, u} ~~~& \vec{1}^Tu \\ \text{s.t.} ~~~& x \ge 0 ~~~ \vec{1}^Tx=1 \\ & \begin{bmatrix} \sum_{k=1}^n x_k a_k a_k^T & e_k \\ e_k^T & u_k\end{bmatrix} \succeq 0 ~~~~~~k=1, …, n \end{align}$$

it fea­tures line­ar matrix ine­qua­li­ties and can be sol­ved, for exam­p­le, with the help of the open-source sol­ver SCS [6].

Application

The auto­ma­tic expe­ri­men­tal design via optimi­zation algo­rithm is illus­tra­ted with an exam­p­le. Again, we have

$$ l \approx A \theta $$

whe­re \(A\) is a matrix, and \(\theta\) are seve­ral unknown effi­ca­cy cha­rac­te­ristics of a pro­duct (e.g., a che­mi­cal). The best esti­ma­tor for \(\theta\) is \(\hat{\theta} = (A^TA)^{-1}A^Tl\) with cova­ri­ance matrix \(C = (A^TA)^{-1}\). Now, choo­se the ent­ries of \(A\) so that \(C\) beco­mes small. The expe­ri­ments \(a_1, …, a_{10}\) are available for sel­ec­tion, who­se fre­quen­cy is to be chosen.

$$ a_1=\begin{bmatrix} 1.62 \\ ‑0.61 \\ ‑0.53 \end{bmatrix}, ~~~~ … ~~~~ a_3=\begin{bmatrix} 1.74 \\ ‑0.76 \\ 0.32 \end{bmatrix}, ~~~~ … ~~~~ a_8=\begin{bmatrix} 1.14 \\ 0.90 \\ 0.50 \end{bmatrix}, ~~~~ … ~~~~ a_{10} =\begin{bmatrix} ‑0.93 \\ ‑0.26 \\ 0.53 \end{bmatrix} $$

Fin­ding a dis­tri­bu­ti­on \(x\) of mea­su­re­ments such that \(\operatorname{tr} ( \sum_{k=1}^{10} x_k a_ka_k^T)^{-1} \) is mini­mi­zed is a semi­de­fi­ni­te pro­gram that can be sol­ved within 0.1 seconds on an office computer.

Figu­re 3: The opti­mal fre­quen­cy dis­tri­bu­ti­on for the 10 dif­fe­rent expe­ri­men­tal vari­ants (a). The sum of vari­ances for various ran­dom­ly cho­sen expe­ri­men­tal con­fi­gu­ra­ti­ons (b). The optimi­zation solu­ti­on iden­ti­fied by the algo­rithm is shown in orange.

The result allows for some con­clu­si­ons. For exam­p­le, expe­ri­ment No. 3 con­tri­bu­tes litt­le to the accu­ra­cy of the effi­ca­cy esti­ma­ti­on, and expe­ri­ment No. 2 is par­ti­cu­lar­ly infor­ma­ti­ve. The abo­ve solu­ti­on to the optimi­zation pro­blem achie­ves signi­fi­cant­ly bet­ter (= smal­ler) vari­ances than other, ran­dom­ly cho­sen expe­ri­men­tal configurations.

Practical aspects

Opti­mal expe­ri­men­tal design is par­ti­cu­lar­ly important when data coll­ec­tion invol­ves signi­fi­cant effort and high cos­ts, and the cha­rac­te­ristics of the object under inves­ti­ga­ti­on are unclear. Spe­cial dif­fi­cul­ties ari­se when cor­re­la­ti­ons or cau­sa­li­ties com­pli­ca­te the sto­cha­stic model \(l \approx A\theta\). The same appli­es to non­line­ar rela­ti­onships that pre­vent for­mu­la­ti­on as a well-defi­ned semi­de­fi­ni­te program.

Howe­ver, if cer­tain stan­dard pre­re­qui­si­tes are met, then opti­mal expe­ri­men­tal design is rou­ti­ne­ly pos­si­ble even with hundreds of mea­su­re­ment decis­i­ons to be made, and the inter­pre­ta­ti­on of the gua­ran­teed qua­li­ty cri­te­ria is rela­tively straightforward.

Code & Sources

Exam­p­le code: OD_experiment_design_1.py, OD_experiment_design_2 in our tuto­ri­al­fol­der

[1] Wol­ko­wicz, H., Sai­gal, R., & Van­den­berg­he, L. (2012). Hand­book of Semi­de­fi­ni­te Pro­gramming: Theo­ry, Algo­rith­ms, and Appli­ca­ti­ons. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.

[2] Flet­cher, R. (1981) A non­line­ar Pro­gramming Pro­blem in Sta­tis­tics (Edu­ca­tio­nal Test­ing). SIAM Jour­nal on Sci­en­ti­fic and Sta­tis­ti­cal Com­pu­ting, 2(3), 257–267.

[3] Gra­fa­rend, Erik W., San­sò, F. (2012). Optimi­zation and Design of Geo­de­tic Net­works. Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.

[4] Ash, Robert B. (2011). Sta­tis­ti­cal Infe­rence: A Con­cise Cour­se. New York: Cou­rier Corporation.

[5] Boyd, S., & Van­den­berg­he, L. (2004). Con­vex Optimi­zation. Cam­bridge: Cam­bridge Uni­ver­si­ty Press.

[6] O’donoghue, B., Chu, E., Parikh, N., & Boyd, S. (2016). Conic Optimi­zation via Ope­ra­tor Split­ting and Homo­ge­neous Self-Dual Embed­ding. Jour­nal of Optimi­zation Theo­ry and Appli­ca­ti­ons, 169(3), 1042–1068.