Lagrange multipliers

Suppose a function f(x):RnR is to be optimized under constraints:

gi(x)=0,i=1,2,mwithm<n

This can be solved by introducing new variables (termed Lagrange variables or slack variables) λi, one for each constraint and optimizing the function:

f~(x)=f(x)+iλigi(x)

wrt x's (now unconstrained) and λ's.

Why does this work? The intuition is that the constraints carve out a nm subdimensional manifold of Rn, corresponding to allowed configurations of x. At each point of the submanifold, the normal space (the directions off the surface) is spanned by the gradients g1,,gm​, and the tangent space (the directions along the surface) is the orthogonal complement.

Furthermore, at the optimum on this manifold, the gradient xf is necessarily normal to the manifold, meaning it is a linear combination of the gradients xgi of all the constraints. In other words:

xf=iλixgi

These are exactly the equations (upto a sign change) produced by unconstrained optimization of f~ above wrt x. λif~ on the other hand yields back the original constraints. So solving this new unconstrained optimization problem is equivalent to the constrained one we set out to solve.


Links

Sources