Suppose a function is to be optimized under constraints:
This can be solved by introducing new variables (termed Lagrange variables or slack variables) , one for each constraint and optimizing the function:
wrt 's (now unconstrained) and 's.
Why does this work? The intuition is that the constraints carve out a subdimensional manifold of , corresponding to allowed configurations of . At each point of the submanifold, the normal space (the directions off the surface) is spanned by the gradients , and the tangent space (the directions along the surface) is the orthogonal complement.
Furthermore, at the optimum on this manifold, the gradient is necessarily normal to the manifold, meaning it is a linear combination of the gradients of all the constraints. In other words:
These are exactly the equations (upto a sign change) produced by unconstrained optimization of above wrt . on the other hand yields back the original constraints. So solving this new unconstrained optimization problem is equivalent to the constrained one we set out to solve.