Measure concentration

Markov's Inequality

Expectation of a non-negative random variable $Z$ can be written (tail-sum formula) as

E [Z] = \int_{x = 0}^{\infty} P [Z \geq x] d x

Since $P [Z \geq x]$ is monotonically nonincreasing, we have

\forall a \geq 0, E [Z] \geq \int_{x = 0}^{a} P [Z \geq x] d x \geq \int_{x = 0}^{a} P [Z \geq a] d x = a P [Z \geq a] .

Rearranging this, we get Markov's inequality:

\forall a \geq 0, P [Z \geq a] \leq \frac{E [Z]}{a}

Chebyshev's Inequality

Applying Markov's inequality on the random variable $(Z - E [Z])^{2}$ , we get:

\forall a \geq 0, P [| Z - E [Z] | \geq a] \leq \frac{Var [Z]}{a^{2}}

This can be used to derive that given a sequence $Z_{1}, Z_{2}, \dots, Z_{m}$ of i.i.d. random variables with $E [Z] = μ$ and $Var [Z] \leq 1$ , then for any $δ \in (0, 1)$ with probability at least $1 - δ$ , we have:

| \frac{1}{m} \sum_{i = 1}^{m} Z_{i} - μ | \leq \sqrt{\frac{1}{δ m}}

(polynomial bound for the estimation error of the mean)

Chernoff's bounds

For independent Bernoulli random variables $Z_{1}, Z_{2}, \dots, Z_{m}$ where $\forall i, P [Z_{i} = 1] = p_{i}$ , denoting $p = \sum_{i = 1}^{m} p_{i}$ and $Z = \sum_{i = 1}^{m} Z_{i}$ , we can show that:

P [Z > (1 + δ) p] \leq \exp (- p \frac{δ^{2}}{2 + \frac{2 δ}{3}})

and

P [Z < (1 - δ) p] \leq \exp (- p \frac{δ^{2}}{2 + \frac{2 δ}{3}})

Hoeffding's Inequality

For a sequence of i.i.d. random variables $Z_{1}, Z_{2}, \dots, Z_{m}$ with $E [Z] = μ$ and $P [a \leq Z \leq b] = 1$ , we have for any $ϵ > 0$ ,

P [| \frac{1}{m} \sum_{i}^{m} Z_{i} - μ | > ϵ] \leq 2 \exp (- \frac{2 m ϵ^{2}}{(b - a)^{2}})

Proof:
Let $X_{i} = Z_{i} - μ$ and $\bar{X} = \frac{1}{m} \sum_{i}^{m} X_{i} = \bar{Z} - μ$ . Using Markov's inequality, we have that for any $λ > 0$ and $ϵ > 0$ ,

P [\bar{X} \geq ϵ] = P [e^{λ \bar{X}} \geq e^{λ ϵ}] \leq e^{- λ ϵ} E [e^{λ \bar{X}}]

Hoeffding's lemma: For a random variable $X$ bounded between $a$ and $b$ with mean zero, for every $λ > 0$ , we have $E [e^{λ X}] \leq \exp (\frac{λ^{2} (b - a)^{2}}{8})$ .

Using independence and Hoeffding's lemma, we can obtain:

\begin{aligned} E [e^{λ \bar{X}}] & = E [\prod_{i} e^{λ X_{i} / m}] = \prod_{i} E [e^{λ X_{i} / m}] \\ ⟹ P [\bar{X} \geq ϵ] & \leq e^{- λ ϵ} \prod_{i} e^{λ^{2} (b - a)^{2} / 8 m^{2}} = e^{- λ ϵ + λ^{2} (b - a)^{2} / 8 m} \end{aligned}

The exponent $- λ ϵ + λ^{2} (b - a)^{2} / 8 m$ is minimized for $λ = \frac{4 m ϵ}{(b - a)^{2}}$ yielding:

P [\bar{X} \geq ϵ] \leq \exp (- \frac{2 m ϵ^{2}}{(b - a)^{2}})

For the other side, we can apply the same argument on the variable $- \bar{X}$ to obtain:

P [\bar{X} \leq - ϵ] \leq \exp (- \frac{2 m ϵ^{2}}{(b - a)^{2}})

Combining the two cases yields the desired bound.

Links

Sources

Understanding Machine Learning From Theory to Algorithms