Theory: Stochasticity

Models and uncertainty

Uncer­tain­ty is an essen­ti­al com­po­nent of models that depict real-world phe­no­me­na. It can stem from various causes.

  • The model invol­ves phe­no­me­na of a ran­dom natu­re, such as nuclear decay, com­po­nent fail­ure, or stock prices.
  •  Dhe model includes deter­mi­ni­stic but unknown quan­ti­ties, such as the length of a not suf­fi­ci­ent­ly accu­ra­te­ly mea­su­red distance or the yet unpu­blished pri­ce of a product.
  • The model is deli­bera­te­ly incom­ple­te, and devia­ti­ons bet­ween model beha­vi­or and rea­li­ty are accoun­ted for as uncer­tain­ty; for exam­p­le, in pro­blems of non­line­ar con­ti­nu­um mecha­nics or feed­back effects in com­plex systems.

In each of the­se cases, a part of the model is con­side­red sto­cha­stic (=ran­dom). Regard­less of whe­ther the ran­dom ele­ment ari­ses from the natu­re of the phe­no­me­non or from igno­rance of sys­tem para­me­ters and rela­ti­onships, it is descri­bed using the same for­ma­lisms of pro­ba­bi­li­ty theory.

Random variables

Ran­dom varia­bles and their pro­ba­bi­li­ty dis­tri­bu­ti­ons play a cen­tral role. A ran­dom varia­ble \(X^{\cdot}\) is a map­ping $$X^{\cdot}:\Omega \ni \omega \mapsto X^{\omega}\in \mathbb{R}$$ which assigns a num­ber to the result of an expe­ri­ment [1, p. 46]. The expe­ri­ment could be the roll of a die, the result being the upper face, and the ran­dom varia­ble maps this result to the num­ber of eyes. Befo­re a die is actual­ly rol­led, the result of the expe­ri­ment is not known, but it can be descri­bed in terms of the expec­ted fre­quen­ci­es of pos­si­ble eye counts. The­se rela­ti­ve fre­quen­ci­es are cal­led pro­ba­bi­li­ties; the assign­ment of pro­ba­bi­li­ties to dif­fe­rent nume­ri­cal values is cal­led a pro­ba­bi­li­ty distribution.

Figu­re 1: Exam­p­le of a ran­dom varia­ble that assigns the sum of the dice eyes when two dice are rol­led. Some values occur more fre­quent­ly on avera­ge than others.

Relevance

When a ran­dom varia­ble \(X^{\cdot}\) appears in an optimi­zation pro­blem, all pos­si­ble values of \(X^{\cdot}\) and their indi­vi­du­al pro­ba­bi­li­ties must be con­side­red. This makes optimi­zation pro­blems har­der to inter­pret. As the objec­ti­ve func­tion is now ran­dom its­elf and cha­rac­te­ri­zed by a pro­ba­bi­li­ty dis­tri­bu­ti­on, decis­i­ons must be made about the aspect of this dis­tri­bu­ti­on that is to be optimized.

For exam­p­le, if the objec­ti­ve func­tion is defi­ned by cos­ts \(c^Tx\) with the optimi­zation varia­ble \(x\in \mathbb{R}^n\) and ran­dom varia­bles \([c_1^{\cdot}, …, c_n^{\cdot}]=c^T\), it may be sen­si­ble to mini­mi­ze the expec­ted value, the ran­ge of fluc­tua­ti­on, maxi­mum values, or quan­ti­les of the cos­ts. Depen­ding on the objec­ti­ve, this leads to line­ar pro­gramming (LP), semi­de­fi­ni­te pro­gramming (SDP), or sto­cha­stic programs.

Typical probability distributions

The most sui­ta­ble pro­ba­bi­li­ty dis­tri­bu­ti­on for mode­ling a ran­dom phe­no­me­non depends on available back­ground know­ledge. Uni­form dis­tri­bu­ti­ons sym­bo­li­ze com­ple­te igno­rance while nor­mal dis­tri­bu­ti­ons are good appro­xi­ma­ti­ons for ran­dom effects com­po­sed of many inde­pen­dent minor errors. The \(\chi^2\)-distribution quan­ti­fies uncer­tain­ties rela­ted to lengths and distances, and the Pois­son dis­tri­bu­ti­on can be used to descri­be fail­ure pro­ba­bi­li­ties. Various tail­o­red pro­ba­bi­li­ty dis­tri­bu­ti­ons and their appli­ca­ti­ons can be found in [2, pp. 828–828].

Figu­re 2: Illus­tra­ti­on of dif­fe­rent pro­ba­bi­li­ty distributions.

Stochastic processes

Mul­ti­va­ria­te pro­ba­bi­li­ty dis­tri­bu­ti­ons are espe­ci­al­ly rele­vant in prac­ti­ce. They quan­ti­fy the pro­ba­bi­li­ties of sto­cha­stic pro­ces­ses: coll­ec­tions of inter­re­la­ted ran­dom varia­bles asso­cia­ted with spe­ci­fic points in time, loca­ti­ons, or more gene­ral­ly any set of indi­ces [1, p. 190]. Sto­cha­stic pro­ces­ses can be used to descri­be phe­no­me­na influen­ced by rand­om­ness in space or time, and even the simp­le mul­ti­va­ria­te nor­mal dis­tri­bu­ti­ons cover a wide ran­ge of poten­ti­al beha­vi­ors to be mode­led [3, pp. 79–94].

Figu­re 3: The four sub-figu­res show simu­la­ti­ons based on four dif­fe­rent mul­ti­va­ria­te nor­mal dis­tri­bu­ti­ons. Each cur­ve is a simu­la­ti­on and cor­re­sponds to a dice roll, the result of which is a ran­dom­ly gene­ra­ted function.

The illus­tra­ti­ons demons­tra­te that even ran­dom­ly gene­ra­ted func­tions can exhi­bit func­tion­al rela­ti­onships. In prin­ci­ple, the assump­ti­on of sto­cha­sti­ci­ty is sel­dom a hin­drance, as sto­cha­stic models include deter­mi­ni­stic models as a subset.

Exemplary application

Let \(X^{\cdot}_{\cdot}:T\times \Omega \ni (t,\omega)\mapsto X^{\omega}_{t} \in \mathbb{R}\) be a sto­cha­stic pro­cess; that is, a sequence of ran­dom varia­bles inde­xed by the varia­ble \(t\in T\). If the pro­cess has been obser­ved only at cer­tain times and is to be esti­ma­ted for all other times based on the­se obser­va­tions, this can be for­mu­la­ted as an optimi­zation pro­blem. Assum­ing a known pro­ba­bi­li­ty dis­tri­bu­ti­on and hence known cova­ri­ances, the best esti­ma­tor \(\hat{X}_{t_0}\) for the value at \(t_0\) based on the obser­va­tions \(X_{t_1}, …, X_{t_n}\) is given by

$$\hat{X}_{t_0} = \sum_{k=1}^n \lambda_k X_{t_k}.$$

Here, \([\lambda_1, …, \lambda_n]=\lambda\) is the solu­ti­on to the qua­dra­tic program

$$\begin{align}   \min_{\lambda} ~~~&\lambda^TC\lambda‑2\lambda^T c + \sigma_{00} \\
\text{s.t.} ~~~&\sum_{k=1}^n \lambda_k=1 \end{align}$$

whe­re \(c\in \mathbb{R}^n\) with \(c_k=\text{Covariance}(X_{t_0},X_{t_j})\), \(C\in \mathbb{R}^{n\times n}\) with \(C_{kl}= \text{Covariance}(X_{t_k},X_{t_l})\), and \(\sigma_{00} = \text{Covariance}(X_{t_0},X_{t_0})\). This pro­blem can be sol­ved algo­rith­mi­cal­ly or manu­al­ly and leads to the method known as Kri­ging in geo­sta­tis­tics [4, pp. 163–164]. The value of the mini­miza­ti­on pro­blem is the vari­ance of the esti­ma­ti­on error.

Figu­re 4: The opti­mal esti­ma­ti­on of all pro­cess values based on a few obser­va­tions of the process. 

Practical aspects

Uncer­tain­ties are part of all real-world phe­no­me­na and must be appro­pria­te­ly repre­sen­ted in optimi­zation pro­blems. This requi­res sel­ec­ting pro­ba­bi­li­ty dis­tri­bu­ti­ons tail­o­red to the phe­no­me­non. Sin­ce it usual­ly invol­ves more than just one ran­dom varia­ble, sto­cha­stic pro­ces­ses are used for mode­ling. The­se have high-dimen­sio­nal pro­ba­bi­li­ty dis­tri­bu­ti­ons and must be incor­po­ra­ted into the optimi­zation pro­blem in a way that allows for meaningful solu­ti­ons to be deri­ved. This works well with mul­ti­va­ria­te nor­mal dis­tri­bu­ti­ons and uni­form­ly dis­tri­bu­ted data but is chal­len­ging for less tho­rough­ly stu­di­ed pro­ba­bi­li­ty distributions.

Code & Sources

Exam­p­le code: Theory_stochastic_processes.py in our tuto­ri­al­fol­der

[1] Mel­sa, J. L., & Sage, A. P. (2013). An Intro­duc­tion to Pro­ba­bi­li­ty and Sto­cha­stic Pro­ces­ses. New York: Cou­rier Corporation.

[2]  Bron­stein, I. N., Müh­lig, H., Musi­ol, G. & Semend­ja­jew, A. K. (2013). Taschen­buch der Mathe­ma­tik. Haan-Grui­ten: Ver­lag Europa-Lehrmittel

[3] Ras­mus­sen, C. E., & Wil­liams, C. K. (2005). Gaus­si­an Pro­ces­ses for Machi­ne Lear­ning. Cam­bridge: MIT Press.

[4] Chilès, J. P., & Del­fi­ner, P. (2009). Geo­sta­tis­tics: Mode­ling Spa­ti­al Uncer­tain­ty. New York: John Wiley & Sons.