No-Free-Lunch theorem

The NFL theorem is essentially a statement that no learning algorithm is perfect for every possible dataset out there. For any learner, there exists a distribution on which it fails, i.e., upon receiving an i.i.d training sample, it outputs a hypothesis that is likely to have a large risk all the while there is some other learner that would output a hypothesis with a small risk. In other words, every learner has tasks on which it fails while other learners succeed. A formal statement for binary classification can be stated as below:

For any learning algorithm $A$ for binary classification w.r.t the discrete binarized 0-1 loss over a domain $X$ , if the training set size $m \leq \frac{| X |}{k}$ , then there exists a distribution $D$ over $X \times {0, 1}$ and a function $f : X \to {0, 1}$ such that $L_{D} (f) = 0$ and:

E_{S \sim D^{m}} [L_{D} (A (S))] \geq \frac{1}{2} - \frac{1}{2 k} .

The proof formalises that for a given finite set of finite samples some of which are used for training, once can always adversarially select a distribution that matches the output of the learner for the training set but maximally disagrees outside it (still restricted to the bigger finite sample set), since we assume no restriction on the distribution or the labeling function. This argument is developed further into the VC-dimension and the concept of shattering.

Links

Sources

Understanding Machine Learning From Theory to Algorithms