A hypothesis class is Probably Approximately Correct (PAC) learnable with respect to a set and a loss function , if there exists a function and a learning algorithm with the following property: For every and for every distribution over , running the learning algorithm on i.i.d. examples generated by returns a hypothesis such that, with probability of at least (over the choice of the training examples),
where . The function determines the sample complexity of learning .
Under a realizability assumption (), we can guarantee an absolute level of error instead of just a relative one w.r.t. some minimizer within the hypothesis class. For example, under the realizability assumption, every finite hypothesis class for a binary classification task is PAC learnable ( with probability ) with sample complexity:
Uniform convergence
A sufficient condition for PAC learnability is the uniform convergence condition, which essentially states that a sample training set is large enough that the empirical risks for all hypotheses in are good approximations of their true risks. Formally a training set is said to be representative if
We say that the hypothesis class has the uniform convergence property (w.r.t. domain , hypothesis class , loss function , and distribution ) if there exists a function such that for every and for every distribution over , if is a sample of i.i.d. examples from , then, with probability of at least , is representative.
If a training set is representative, then any output of satisfies . This means that if a class has the uniform convergence property with a function , then the class is PAC learnable with sample complexity . This can be used to prove that any finite hypothesis class is PAC learnable using the ERM algorithm with sample complexity:
dropping the realizability assumption from above (now for general loss functions).