Optimal estimation: Estimation of functions

Definition

In para­me­ter esti­ma­ti­on, sim­pli­fy­ing assump­ti­ons are made about the ori­gin of obser­ved data \(l\). They are con­side­red explainable by a func­tion \(g(x,z)\), which depends, among other fac­tors, on para­me­ters \(x\) with wrong model assump­ti­ons impac­ting esti­ma­ti­on per­for­mance heavily.

In gene­ral func­tion esti­ma­ti­on, this sim­pli­fy­ing assump­ti­on is aban­do­ned, and the func­tion \(f\) obser­ved through data \(l\) is not rest­ric­ted to a para­me­tric fami­ly. Ins­tead, \(f\) is con­side­red as a sto­cha­stic process—a set of cor­re­la­ted ran­dom varia­bles asso­cia­ted with loca­ti­ons \(t \in T\).

Relevance

Through this sto­cha­stic for­mu­la­ti­on, pro­blems can be for­ma­li­zed and sol­ved that are not acces­si­ble to nor­mal para­me­ter esti­ma­ti­on. Addi­tio­nal­ly, sto­cha­stic pro­ces­ses pro­vi­de a fle­xi­ble func­tion­al model for \(f\) and allow the ana­ly­sis of data for which no con­vin­cing para­me­tric model of the form \(l = g(x, z)\) can be deri­ved from exter­nal circumstances.

Figu­re 1: Illus­tra­ti­on of various qua­dra­tic models \(x_1+x_2z+x_3z^2\) with three dif­fe­rent choices for the para­me­ter vec­tor (a). In (b), various ran­dom mani­fes­ta­ti­ons of the same sto­cha­stic pro­cess are shown — the ran­ge of beha­vi­or is cle­ar­ly wider.

Some typi­cal ques­ti­ons arise:

  • Given mea­su­re­ments \(f(z_1), f(z_2)\), how lar­ge is \(f\) at other loca­ti­ons \(z\)?
  • How likely is it that \(f(z_1)\ge 1\)?
  • Given mea­su­re­ments \(f(z_1), f(z_2)\), what is \(\int_{0}^{1} f(z) dz\)?
  • Are the data explainable by a smooth or an irre­gu­lar, sharp-edged process?

Detailed explanation

For ins­tance, if the data \(l_j, j=1, …, n\) are mea­su­re­ments of raw mate­ri­al depo­sits in the ground at loca­ti­ons \(z_j, j=1, …, n\), then all the­se ques­ti­ons are important for esti­mat­ing the eco­no­mic via­bi­li­ty of mining the resources.

Inde­ed, the pro­blem of interpolation—estimating all func­tion values \(f(z)\) based on indi­vi­du­al mea­su­re­ments \(l_j=f(z_j), j=1, …, n\)—was first sys­te­ma­ti­cal­ly exami­ned in the con­text of resour­ce pro­s­pec­ting [1]. The­re are many func­tions \(f\) such that \(l_j=f(z_j), j=1, …, n\), thus the ques­ti­on ari­ses as to which func­tion \(f\) is most likely accor­ding to pri­or know­ledge and data.

Figu­re 2: Various poten­ti­al func­tions \(f\) that all inter­po­la­te the obser­ved data \(l_j, j=1, …, n\) but other­wi­se exhi­bit com­ple­te­ly dif­fe­rent behaviors.

Interpolation: Optimization problem

The optimi­zation pro­blem for deri­ving the most pro­ba­ble func­tion \(f\) is expres­sed as:

$$ \begin{align} \min_{f \in \mathcal{H}_K} ~~~& \|f\|_{\mathcal{H}_K}^2 \\ ~~~&\text{s.t.} f(z_j)=l_j ~~~~ j=1, …, n \end{align}$$

whe­re \(\mathcal{H}_K\) is a func­tion space and \(-\|f\|^2_{\mathcal{H}_K}\) indi­ca­tes the pro­ba­bi­li­ty of a func­tion \(f\) within this space. Details of this for­mu­la­ti­on can be found in [2, p. 111]; par­ti­cu­lar­ly important is the refor­mu­la­ti­on as a qua­dra­tic pro­gram to deter­mi­ne weights \(\lambda \in \mathbb{R}^n\) with \(f(z)=\sum_{j=1}^n\lambda_j l_j\).

$$ \begin{align} \min_{\lambda} ~~~& (1/2)\lambda^TK_{II}\lambda — \lambda^TK_{I} \\ \text{s.t.} ~~~\sum_{j=1}^n \lambda_j =1 \end{align}$$

\(K_{II}\) and \(K_{I}\) are matri­ces and vec­tors con­tai­ning the cor­re­la­ti­on struc­tures of \(f\). They encode the under­ly­ing assump­ti­ons about, for exam­p­le, the smooth­ness of \(f\). The optimi­zation pro­blem can be sol­ved with sol­vers for qua­dra­tic pro­gramming or by hand.

Figu­re 3: The opti­mal esti­ma­te out­put by sol­ving the optimi­zation pro­blem and the under­ly­ing cor­re­la­ti­on structure.

Correlation structure

The cor­re­la­ti­on matri­ces indi­ca­te how stron­gly the values \(f(z_1), f(z_2)\) at dif­fe­rent posi­ti­ons \((z_1, z_2)\) are cor­re­la­ted: A value of \(0\) for \(z_1=0\) and \(z_2=1\) thus indi­ca­tes that the­re is no signi­fi­cant cor­re­la­ti­on bet­ween \(f(0)\) and \(f(1)\). If the­re are no relia­ble pri­or assump­ti­ons about the cor­re­la­ti­on struc­tures, they can also be deri­ved from the data. This too is an opti­mal esti­ma­ti­on pro­blem and can be review­ed here .

Abstract splines

Inter­po­la­ting data is often hel­pful. Howe­ver, data does not always ari­se from point mea­su­re­ments, is error-free, or its cor­re­la­ti­on struc­tu­re is known. The curr­ent­ly most gene­ral, still effi­ci­ent­ly sol­va­ble esti­ma­ti­on pro­blem is for­mu­la­ted as fol­lows [2, p. 117]:

$$ \begin{align} \min_f ~~~& \|Af‑l\|^2_{\mathcal{H}_A}+ \|Bf\|^2_{\mathcal{H}_B} & \\ & f : \text{ Func­tion to be esti­ma­ted}  && l : \text{ Data} \\  & A : \text{ Mea­su­re­ment ope­ra­tor} && \mathcal{H}_A : \text{ Func­tion space of poten­ti­al mea­su­re­ments} \\ & B : \text{ Ener­gy ope­ra­tor} && \mathcal{H}_B : \text{ Func­tion space of poten­ti­al ener­gies} \end{align}$$

Solu­ti­ons to the­se mini­miza­ti­on pro­blems are cal­led abs­tract spli­nes and they maxi­mi­ze the pro­ba­bi­li­ties of dis­crepan­ci­es \(Af‑l\) bet­ween actu­al and hypo­the­ti­cal obser­va­tions as well as the pro­ba­bi­li­ty of \(f\) its­elf. The mea­su­re­ment ope­ra­tor \(A\) maps func­tions \(f\) to hypo­the­ti­cal obser­va­tions \(Af\) and the ener­gy ope­ra­tor \(B\) maps func­tions \(f\) to quan­ti­ties \(Bf\), who­se pro­ba­bi­li­ty dis­tri­bu­ti­on is known.

Applications

The solu­ti­ons to a vast num­ber of opti­mal esti­ma­ti­on pro­blems can be repre­sen­ted as abs­tract spli­nes. For ins­tance, if \(f\) is a two-dimen­sio­nal func­tion and \(Af\) are line inte­grals \( (Af)_j= \int_{z_0}^{z_j} f(z) dz\), then the abs­tract spli­nes are solu­ti­ons for tomo­gra­phy pro­blems, as shown in the figure.

Figu­re 4: In tomo­gra­phy, only the over­all effects along pro­pa­ga­ti­on paths are mea­su­red, and the goal is to infer the dis­tri­bu­ti­on of indi­vi­du­al effects.

If, howe­ver, \(A\) is sim­ply the iden­ti­ty ope­ra­tor and \(\mathcal{H}_A\) and \(\mathcal{H}_B\) are func­tion spaces of func­tions with dif­fe­rent cor­re­la­ti­on struc­tures, then the abs­tract spli­nes repre­sent solu­ti­ons for signal sepa­ra­ti­on problems.

Figu­re 5: An over­laid signal \(f_1+f_2\) is to be sepa­ra­ted into indi­vi­du­al signal com­pon­ents \(f_1\) and \(f_2\). The dif­fe­rent cor­re­la­ti­on struc­tures of \(f_1\) and \(f_2\) are used for distinction.

In addi­ti­on to the­se two examp­les from signal pro­ces­sing, opti­mal func­tion esti­ma­ti­on is also used for many other pur­po­ses. Appli­ca­ti­ons include raw mate­ri­al pro­s­pec­ting, image pro­ces­sing, mea­su­re­ment data eva­lua­ti­on, mode­ling envi­ron­men­tal phe­no­me­na con­cer­ning e.g., epi­de­mio­lo­gi­cal spread pro­ces­ses, dis­tri­bu­ti­on of atmo­sphe­ric para­me­ters, geo­lo­gi­cal pro­per­ties, and land use, as well as crea­ting sur­ro­ga­te models, com­pres­sing and fil­te­ring video data, and much more.

Practical aspects

For­mu­la­ting and sol­ving real-world pro­blems as abs­tract func­tion esti­ma­ti­on pro­blems invol­ves seve­ral steps. First­ly, the­re’s the chall­enge of iden­ti­fy­ing a spe­ci­fic task as sol­va­ble by esti­mat­ing a func­tion, which is not always straight­for­ward. Fur­ther­mo­re, pre­cise for­mu­la­ti­on and trans­for­ma­ti­on play a cru­cial role in ensu­ring the sol­va­bi­li­ty of the optimi­zation pro­blem. Strict­ly spea­king, abs­tract spli­nes are optimi­zation pro­blems in infi­ni­te-dimen­sio­nal spaces (func­tion spaces typi­cal­ly have this pro­per­ty); hence, cle­ver manu­al com­pu­ta­ti­ons are required.

Last­ly, the cor­re­la­ti­on struc­tures of the solu­ti­ons must eit­her be pre­scri­bed based on pre­vious data foun­da­ti­ons or based on pri­or assump­ti­ons. This requi­res expe­ri­ence in mode­ling with sto­cha­stic pro­ces­ses. If the­se three chal­lenges are suc­cessful­ly mas­te­red, the result is an esti­ma­ti­on that is opti­mal from a sto­cha­stic perspective.

Code & Sources

Exam­p­le code: OE_conditional_simulation.py , OE_random_quantities.py , OE_functional_signal_separation.py , OE_simulation_support_funs.py  in our tuto­ri­al­fol­der

[1] Cres­sie, N. (1990). The ori­g­ins of kri­ging. Mathe­ma­ti­cal geo­lo­gy, 22, 239–252.

[2] Ber­li­net, A., & Tho­mas-Agnan, C. (2011). Repro­du­cing Ker­nel Hil­bert Spaces in Pro­ba­bi­li­ty and Sta­tis­tics: . Ber­lin Hei­del­berg: Sprin­ger Sci­ence & Busi­ness Media.