Lagrange multipliers

Suppose a function $f (\vec{x}) : R^{n} \to R$ is to be optimized under constraints:

g_{i} (\vec{x}) = 0, i = 1, 2, \dots m with m < n

This can be solved by introducing new variables (termed Lagrange variables or slack variables) $λ_{i}$ , one for each constraint and optimizing the function:

\tilde{f} (\vec{x}) = f (\vec{x}) + \sum_{i} λ_{i} g_{i} (\vec{x})

wrt $x$ 's (now unconstrained) and $λ$ 's.

Why does this work? The intuition is that the constraints carve out a $n - m$ subdimensional manifold of $R^{n}$ , corresponding to allowed configurations of $\vec{x}$ . At each point of the submanifold, the normal space (the directions off the surface) is spanned by the gradients $\nabla g_{1}, \dots, \nabla g_{m}$ , and the tangent space (the directions along the surface) is the orthogonal complement.

Furthermore, at the optimum on this manifold, the gradient $\nabla_{x} f$ is necessarily normal to the manifold, meaning it is a linear combination of the gradients $\nabla_{x} g_{i}$ of all the constraints. In other words:

\nabla_{x} f = - \sum_{i} λ_{i} \nabla_{x} g_{i}

These are exactly the equations (upto a sign change) produced by unconstrained optimization of $\tilde{f}$ above wrt $\vec{x}$ . $\nabla_{λ_{i}} \tilde{f}$ on the other hand yields back the original constraints. So solving this new unconstrained optimization problem is equivalent to the constrained one we set out to solve.

Links

Sources