Overview: Machine learning

General

On the fol­lo­wing sub­pages, you will find infor­ma­ti­on about the types of pro­blems that can be sol­ved using Machi­ne Lear­ning (ML). The methods used and the basic theo­ry are brief­ly out­lined; code examp­les detail prac­ti­cal applications.

Relevance

Machi­ne Lear­ning (ML) is one of the newest, most acti­ve, and appli­ca­ti­on-ori­en­ted fields of modern mathe­ma­tics. It has achie­ved prac­ti­cal suc­cess espe­ci­al­ly in tasks that invol­ve human con­tex­tu­al under­stan­ding. The pro­ces­sing of text, audio, image, and video data with the goal of deri­ving seman­ti­cal­ly rele­vant infor­ma­ti­on for humans has made ML an important tool. It is used par­ti­cu­lar­ly in the enter­tain­ment indus­try, con­su­mer rese­arch, finan­ce, pre­dic­ti­ve ana­ly­tics, medi­cal tech­no­lo­gy, bio­in­for­ma­tics, indus­tri­al image inter­pre­ta­ti­on, for the auto­ma­ti­on of vehic­les and warehouse ope­ra­ti­ons, and in other are­as that bene­fit from the auto­ma­ted eva­lua­ti­on and inter­pre­ta­ti­on of high-dimen­sio­nal data.

Definition

Unli­ke mathe­ma­ti­cal optimi­zation, the task of machi­ne lear­ning (ML) is less cle­ar­ly defined—it does not just invol­ve sol­ving an equa­ti­on. Ins­tead, ML deals with the deve­lo­p­ment and ana­ly­sis of algo­rith­ms who­se per­for­mance impro­ves with incre­asing amounts of data. Mathe­ma­ti­cal­ly, this can be for­ma­li­zed as the mini­miza­ti­on of a loss func­tion \(f_S(x)\) under cons­traints \(D\) for the para­me­ters \(x\), the deter­mi­na­ti­on of which is the goal of learning.

$$ \begin{align} \min_x ~~~&f_S(x) \\ \text{subject to} ~~~&x \in D \end{align}$$

Howe­ver, com­pared to clas­si­cal optimi­zation, \(f_S(x)\) is an a prio­ri unknown func­tion that only takes shape through real-world data \(S\). For exam­p­le, for an ML pro­gram lear­ning to play chess, it is only clear which game situa­tions are desi­ra­ble after it has play­ed seve­ral rounds and eva­lua­ted the expe­ri­ence from the­se rounds. The com­bi­na­ti­on of data-depen­dent models and mathe­ma­ti­cal optimi­zation is typi­cal for ML.

Context

In this respect, ML algo­rith­ms dif­fer from clas­si­cal soft­ware, which is a clo­sed sys­tem con­sis­ting of a fixed sequence of com­mands and thus is not capa­ble of alte­ring its own func­tion­a­li­ty by inte­gra­ting new data. The abili­ty to learn from new expe­ri­en­ces, along with the ubi­qui­ty of cer­tain types of data, makes ML fle­xi­ble and ver­sa­ti­le, but also com­plex in its over­all array of methods. Typi­cal­ly, ML is divi­ded into three clas­ses of tasks: super­vi­sed lear­ning, rein­force­ment lear­ning, and unsu­per­vi­sed lear­ning. The fol­lo­wing image pro­vi­des a rough overview.

Figu­re: The dif­fe­rent task classes—supervised lear­ning, rein­force­ment lear­ning, unsu­per­vi­sed learning—differ in the degree of super­vi­si­on, i.e., the direct­ness with which desi­red beha­vi­or is com­mu­ni­ca­ted to the algorithm.

Examples

Depen­ding on which loss func­tion \(f_S(x)\) is cho­sen, an algo­rithm can be direc­ted to sol­ve various tasks: \(f_S(x)\) can mea­su­re pre­dic­tion errors, mis­clas­si­fi­ca­ti­on rates, pen­al­ties for sub­op­ti­mal sys­tem con­trols, intra-group vari­ances, or recon­s­truc­tion errors. Opti­mal­ly cho­sen para­me­ters \(x_1, x_2, …\), deter­mi­ne the beha­vi­or of the algo­rithm such that the loss func­tion (as a mea­su­re of the impact of unde­si­ra­ble beha­vi­or) takes on the smal­lest pos­si­ble values. Dif­fe­rent mea­nings of \(f_S(x)\) and asso­cia­ted appli­ca­ti­ons are noted in the table below.

Exam­p­le\(f_S(x)\)\(S\)\(V(x_1,x_2,…)\)
Pri­ce predictionPre­dic­tion errorPro­duct fea­tures, pricesPri­ce
Machi­ne translation- Sen­tence probabilityPairs of sentencesTrans­la­ted sentence
Image clas­si­fi­ca­ti­onMis­clas­si­fi­ca­ti­on rateImages object classesClass pro­ba­bi­li­ties
Can­cer diagnosisMis­clas­si­fi­ca­ti­on rateMedi­cal data, diagnosesCan­cer probability
Machi­ne controlInef­fec­ti­ve sys­tem dynamicsPre­vious con­trol cyclesCon­trol signals
Game AIPro­ba­bi­li­ty of defeatPre­vious gamesGame stra­tegy
Data com­pres­si­onRecon­s­truc­tion errorExam­p­le dataCom­pres­sed object
Fraud iden­ti­fi­ca­ti­onBeha­viou­ral consistencyMeta­da­ta transactionsIrre­gu­la­ri­ty transaction
Table with examp­les of machi­ne lear­ning appli­ca­ti­ons. The term \(f_S(x)\) is the loss func­tion to be mini­mi­zed, \(S\) repres­ents the data, and \(V(x_1, x_2, \ldots)\) is the out­put of the algo­rithm trai­ned with ML.

Applications

Many prac­ti­cal tasks from disci­pli­nes such as finan­ce, mar­ke­ting, medi­ci­ne, image pro­ces­sing, game theo­ry, data ana­ly­sis, etc., can be for­mu­la­ted as an ML pro­blem \(\min_x f_S(x), x \in D\). Regard­less of the spe­ci­fic appli­ca­ti­on, we iden­ti­fy three clas­ses of tasks into which tasks from the abo­ve disci­pli­nes can be categorized:

Supervised learning

Super­vi­sed lear­ning refers to ML tasks whe­re the desi­red beha­vi­or of the algo­rithm can be direct­ly spe­ci­fied in the form of data, par­ti­cu­lar­ly through regres­si­on and clas­si­fi­ca­ti­on. A model (e.g., a neu­ral net­work) should then be adjus­ted in terms of its para­me­ters so that the model beha­vi­or repres­ents the exem­pla­ry data as accu­ra­te­ly as pos­si­ble. If the model is well-cho­sen, it not only repli­ca­tes input-out­put rela­ti­onships set by the data but also acts plau­si­bly in new situa­tions not cover­ed by the trai­ning data. This class of tasks includes crea­ting sta­tis­ti­cal models, pre­dic­ti­ve ana­ly­tics, clas­si­fi­ca­ti­on of text, audio, images, vide­os, text trans­la­ti­on, auto­ma­tic gene­ra­ti­on of sub­tit­les, and much more.

Reinforcement Learning

Rein­force­ment lear­ning refers to ML tasks whe­re the­re is posi­ti­ve and nega­ti­ve feed­back to assess the beha­vi­or exhi­bi­ted by the algo­rithm, but the­re are no direct hints on which beha­vi­or is exem­pla­ry and the­r­e­fo­re to be imi­ta­ted. The algo­rithm inter­acts with a sys­tem that it can modi­fy with con­trol signals, upon which the sys­tem responds with a chan­ge and rein­for­cing or puni­ti­ve feed­back. Rein­force­ment lear­ning thus mimics the lear­ning beha­vi­or in real and uncer­tain con­texts, simi­lar to a per­son play­ing chess for the first time. The goal is to deri­ve opti­mal sequen­ces of decis­i­ons under uncer­tain­ty and in com­pe­ti­ti­ve situa­tions. This class of tasks includes opti­mal machi­ne con­trol, trai­ning AIs in games, acti­ve port­fo­lio manage­ment, traf­fic flow manage­ment, warehouse manage­ment, and pro­cu­re­ment planning.

Unsupervised learning

Unsu­per­vi­sed lear­ning invol­ves the algo­rithm recei­ving no imme­dia­te gui­de­lines on the desi­red beha­vi­or. The algo­rithm must inde­pendent­ly dis­co­ver pat­terns in the data wit­hout pri­or lear­ning expe­ri­ence. The­se pat­terns are then used to clus­ter the data or redu­ce it to its most important com­pon­ents. This pro­cess gene­ra­tes a struc­tu­re on a data­set that can sub­se­quent­ly be used to, for exam­p­le, iden­ti­fy aty­pi­cal finan­cial tran­sac­tions, clus­ter rela­ted genes, iden­ti­fy eco­lo­gi­cal­ly con­nec­ted plant com­mu­ni­ties, ope­ra­te recom­men­der sys­tems, seg­ment mar­kets into groups, or ana­ly­ze social networks.

Outlook

The afo­re­men­tio­ned tasks are stan­dard machi­ne lear­ning (ML) tasks with spe­ci­al­ly desi­gned algo­rith­ms that have been suc­cessful­ly tes­ted in prac­ti­ce. We pre­sent some exam­p­le pro­blems below and illus­tra­te their solu­ti­ons with code, sket­ches, and descrip­ti­ons. We empha­si­ze explai­ning the rela­ti­onship bet­ween the beha­vi­or of an ML algo­rithm, the loss func­tions, and the real-world impli­ca­ti­ons. Ulti­m­ate­ly, we for­ma­li­ze the­se as optimi­zation pro­blems again. Com­pared to for­mu­la­ti­ons in clas­si­cal mathe­ma­ti­cal optimi­zation, howe­ver, data-dri­ven non­line­ar terms such as cross-entro­py, Kull­back-Leib­ler diver­gen­ces, and para­me­ters in neu­ral net­works appear, which do not allow for relia­ble optimi­zation. This leads to expe­ri­men­tal nume­ri­cal methods; howe­ver, for neu­ral net­works, the­re is still good publicly available soft­ware such as PyTorch.

Prac­ti­cal appli­ca­ti­ons, methods, and theo­ry can be found in the sec­tions named accor­din­gly. We hope the mate­ri­al inspi­res you in iden­ti­fy­ing or sear­ching for appli­ca­ti­ons of machi­ne lear­ning in your business.