VC-dimension

Shattering

A hypothesis class H is said to shatter a finite set CX if the restriction of H to C is the set of all functions from C to {0,1} (for binary classification). In other words, |HC|=2|C| and every possible binary function on C is part of some hypothesis in H and is, in that sense, not restrictive on the possible solutions.

Whenever some set CX is shattered by H, an adversary is not restricted by H as they can construct a distribution over C based on any target function from C to {0,1}, while still maintaining realizability (on C), yielding the No-Free-Lunch theorem.

Intuitively, if a set C is shattered by H, and we receive a sample containing half the instances of C, the labels of these instances give us no information about the labels of the rest of the instances in C – every possible labeling of the rest of the instances can be explained by some hypothesis in H.

Vapnik-Chervonenkis-dimension

The VC-dimension of a hypothesis class H is the maximal size of a set CX that can be shattered by H.
If H can shatter sets of arbitrarily large size, then H is said to have infinite VC-dimension, and is consequently not PAC learnable. On the other hand, the following holds and is termed The Fundamental Theorem of Statistical Learning:

Let H be a hypothesis class of functions from X to {0,1} under the binary loss function. Then the uniform convergence property, PAC learnability and finite VC-dimension are all equivalent. Quantitatively, if VCdim(H)=d<, then there are absolute constants C1,C2 such that:

  1. H has the uniform convergence property with sample complexity:
C1d+log(1δ)ϵ2mHUC(ϵ,δ)C2d+log(1δ)ϵ2
  1. H is (agnostic) PAC learnable with sample complexity:
C1d+log(1δ)ϵ2mH(ϵ,δ)C2d+log(1δ)ϵ2
  1. H is PAC learnable (i.e, under strict realizability) with sample complexity:
C1d+log(1δ)ϵ2mH(ϵ,δ)C2dlog(1ϵ)+log(1δ)ϵ2

This theorem holds for some other learning problems such as regression with the absolute or squared error loss, but not for all learning tasks.

Example

Let H={ha:aR} be the class of threshold functions over R, where ha:R{0,1} is defined by ha(x)=1[x<a] (i.e., 1 if x<a and 0 otherwise). Although H is of infinite size, it is PAC learnable, as its VC dimension is finite:


Links

Sources