Machine learning

Algorithms for converting experience (training data) into expertise/knowledge (prediction function or program). Can be categorized along different axes:

Formal Model - Statistical Learning Framework

Formally, given a distribution D over data space X, ground truth target function f:XY, training data S=((x1y1),(xm,ym)), a learning algorithm outputs a prediction function hS:XY.

The generalization error of h (for a classification problem) is the probability of randomly choosing an example x for which h(x)f(x), i.e.,

LD,f(h):=PxD[h(x)f(x)]

The goal of the algorithm is to find the predictor (which depends on the training set S) that minimizes the error with respect to the unknown D and f. The training error or empirical risk/error LS(h) is the error that the classifier incurs over the training samples.

Empirical risk minimization (ERM) is the simple learning paradigm that aims to minimize the training error, but one needs to watch out for overfitting, wherein the model performs excellently on the training data but poorly on the true distribution. A common remedy is to restrict the hypothesis class H of the predictor functions h choosing in advance the set of predictors we will optimize over based on some prior knowledge we have about the learning problem. This biases the algorithm to a particular set of predictors (inductive bias). Formally,

ERMH(S)argminhHLS(h)

LD,f(hS) depends on the training set S which arises from a random process (usually i.i.d sampling). This means there is randomness in the choice of the predictor hS and, consequently, in the risk LD,f(hS). The sample could be unrepresentative with some probability, and even if not, there could be errors due to the finiteness of the sample. This can be formalized by the notion of PAC Learnability.


Links

Sources