A hypothesis class is said to shatter a finite set if the restriction of to C is the set of all functions from to (for binary classification). In other words, and every possible binary function on is part of some hypothesis in and is, in that sense, not restrictive on the possible solutions.
Whenever some set is shattered by , an adversary is not restricted by as they can construct a distribution over based on any target function from to , while still maintaining realizability (on ), yielding the No-Free-Lunch theorem.
Intuitively, if a set is shattered by , and we receive a sample containing half the instances of , the labels of these instances give us no information about the labels of the rest of the instances in – every possible labeling of the rest of the instances can be explained by some hypothesis in .
Vapnik-Chervonenkis-dimension
The VC-dimension of a hypothesis class is the maximal size of a set that can be shattered by
If can shatter sets of arbitrarily large size, then is said to have infinite VC-dimension, and is consequently not PAC learnable. On the other hand, the following holds and is termed The Fundamental Theorem of Statistical Learning:
Let be a hypothesis class of functions from to under the binary loss function. Then the uniform convergence property, PAC learnability and finite VC-dimension are all equivalent. Quantitatively, if , then there are absolute constants such that:
has the uniform convergence property with sample complexity:
is (agnostic) PAC learnable with sample complexity:
is PAC learnable (i.e, under strict realizability) with sample complexity:
This theorem holds for some other learning problems such as regression with the absolute or squared error loss, but not for all learning tasks.
Example
Let be the class of threshold functions over , where is defined by (i.e., if and otherwise). Although is of infinite size, it is PAC learnable, as its VC dimension is finite:
shatters sets of size 1. Take . Choosing gives , while gives . Hence is the set of all functions from to , and shatters .
shatters no set of size 2. Take with . No can realize the labeling : any threshold assigning label to must assign label to as well. Thus not all functions from to are in , so is not shattered.
It follows that .