Nonlinear Optimization

General optimization problem definition

x \in S minimize subject to f (x), g_{i} (x) \leq 0, i \in I h_{i} (x) = 0, i \in E x \in X

Classification

Linear programming (LP)

Linear objective function $f (x) = c^{T} x = \sum_{j = 1}^{n} c_{j} x_{j}$
Affine constraint functions $g_{i} (x) = a_{i}^{T} x - b_{i}, i \in I \cup E$
Ground set $X \subset R^{n}$ defined by affine equalities or inequalities.

Nonlinear programming (NLP)

Some functions $f, g_{i}, i \in I \cup E$ are nonlinear.

Unconstrained optimization

$I \cup E = \emptyset$
$X = R^{n}$

Constrained optimization

$I \cup E \neq = \emptyset$
$X \subset R^{n}$

Integer programming (IP)

$X \subset Z^{n}$ or $X \subseteq {0, 1}^{n}$

Convex programming (CP)

$f, g_{i}, i \in I$ are convex functions
$g_{i}, i \in E$ are affine
$X$ is closed and convex

Modeling

Formulating the problem

Define what sets the problem requires
Acknowledge what parameters the problem has and to what set it belongs to
What type of descision variables is suitable for the problem

Convexity

Convex set

A set $S \subseteq R^{n}$ is convex if

x_{1}, x_{2} \in S, λ \in (0, 1) ⟹ λ x_{1} + (1 - λ) x_{2} \in S

Affine hull

The affine hull of a finite set $V = {v^{1}, v^{2}, \dots, v^{k}}$ is defined as

aff V : = {λ_{1} v^{1} + \dots + λ_{k} v^{k} ∣ ∣ ∣ ∣ ∣ λ_{1}, \dots, λ_{k} \in R, i = 1 \sum k λ_{i} = 1}

Convex hull

The convex hull of a finite set $V = {v^{1}, v^{2}, \dots, v^{k}}$ is defined as

conv V : = {λ_{1} v^{1} + \dots + λ_{k} v^{k} ∣ ∣ ∣ ∣ ∣ λ_{1}, \dots, λ_{k} \geq 0, i = 1 \sum k λ_{i} = 1}

Affine combination

An affine combination of the points ${v_{1}, \dots, v_{n}}$ is a vector satisfying

v = i = 1 \sum n λ_{i} v_{i}

where $\sum_{i = 1}^{n} λ_{i} = 1$

Convex combination

A convex combination of the points ${v_{1}, \dots, v_{n}}$ is a vector satisfying

v = i = 1 \sum n λ_{i} v_{i}

where $\sum_{i = 1}^{n} λ_{i} = 1$ and $λ_{i} \geq 0$ for every $i = 1, \dots, n$

Polytype

A set $P \subset R^{n}$ is a polytype if it is the convex hull of finitely many points in $R^{n}$ .

Polyhedron

A set $P \subset R^{n}$ is a polyhedron if there exists a matrix $A \in R^{m \times n}$ and a vector $b \in R^{m}$ such that

P = {x \in R^{n} ∣ A x \leq b}

The set $P$ is the intersection of $m$ half-spaces. Polyhedrons may be unbounded.

Cone

A set $C \subseteq R^{n}$ is a cone if $λ x \in C$ whenever $x \in C$ and $λ > 0$ .

Polyhedral cone

The set ${x \in R^{n} ∣ A x \leq 0^{m}}$ where $A \in R^{m \times n}$ is a cone but also a polyhedron, which is why it is usually called a polyhedral cone.

Half space

A half space is the set the cuts the space in two parts

{x \in R^{n} ∣ a_{i} x \leq b_{i}}

Convex function

Suppose that $S \subseteq R^{n}$ is a convex set, then a function $f : R^{n} \to R$ is convex on $S$ if

x^{1}, x^{2} \in S, λ \in (0, 1) ⟹ f (λ x^{1} + (1 - λ) x^{2}) \leq λ f (x^{1}) + (1 - λ) f (x^{2})

A function is strictly convex if the inequality is strict.
A function $f$ is concave if $- f$ is convex.

Note that linear functions are both convex and concave.

Carathéodory's Theorem

Let $x \in conv V$ where $V \subseteq R^{n}$ . Then $x$ can be expressed as a convex combination of $n + 1$ or fewer points of $V$ .

Representation Theorem

The Representation Theorem states that every polyhedron that has at least one extreme point is the sum of a polytope and a polyhedral cone.

Farkas' Lemma

Let $A \in R^{m \times n}$ and $b \in R^{m}$ . Then exactly one of the systems has a feasible solution

A x x = b \geq 0

and

A^{T} y b^{T} y \leq 0 > 0

and the other one is inconsistent.

Existence of optimal solutions

Notation

We say that a set $S \subseteq R^{n}$ is open if for every $x \in S$ there exists an $ϵ > 0$ such that $B_{ϵ} (x) : = {y \in R^{n} ∣ ∥ y - x ∥ < ϵ} \subset S$ .
A set $S \subseteq R^{n}$ is closed if $R^{n} \ S$ is open.
A limit point of a set $S \subseteq R^{n}$ is a point $x$ such that there exists a sequence ${x_{k}}_{k = 1}^{\infty} \subset S$ fulfilling $x_{k} \to x$ .
We can then define a closed set as a set which contains all its limit points.
We say that a set $S \subseteq R^{n}$ is bounded if there exists a constant $C > 0$ such that $∥ x ∥ \leq C$ for all $x \in S$ .
If a set is both closed and bounded, we call it compact.

Definition

Weakly coercive function

A function $f$ is said to be weakly coercive with respect to the set $S$ if either $S$ is bounded or

∥ x ∥ \to \infty x \in S lim f (x) = \infty

Lower semi-continuity

A function $f$ is said to be lower semi-continuous at $x$ if the value $f (x)$ is less than or equal to every limit of $f$ as $x_{k} \to x$

x_{k} \to x ⟹ f (x) \leq k \to \infty lim inf f (x_{k})

Feasible direction

Let $x \in S$ . A vector $p \in R^{n}$ defines a feasible direction at $x$ if

\exists δ > 0 : x + α p \in S, \forall α \in [0, δ] .

The feasible direction is therefore a direction at a point where we can move without becoming infeasible.

Descent direction

Let $x \in R^{n}$ . A vector $p \in R^{n}$ defines a descent direction with respect to $f$ at $x$ if

\exists δ > 0 : f (x + α p) < f (x), \forall α \in (0, δ] .

Normal cone

Suppose the set $S$ is closed and convex. Let $x \in S$ . Then the normal cone to $S$ at $x$ is the set

N_{s} (x) : = {p \in R^{n} ∣ p^{T} (y - x) \leq 0, y \in S}

Weierstrass' Theorem

Consider the problem where $S$ is a nonempty and closed set and $f$ is lower semi-continuous on $S$ . If $f$ is weakly coercive with respect to $S$ , then there exists a nonempty, closed and bounded (compact) set of optimal solutions to the problem.

Optimality conditions (S equal R^n)

When $S = R^{n}$ (unconstrained optimization problem), the following theorem holds

Necessary condition for optimality 1

If $f \in C^{1}$ on $R^{n}$ then

x^{*} is a local minimum of f on R^{n} ⟹ \nabla f (x^{*}) = 0

Necessary condition for optimality 2

If $f \in C^{2}$ on $R^{n}$ then

x^{*} is a local minimum of f on R^{n} ⟹ {\nabla f (x^{*}) = 0 \nabla^{2} f (x^{*}) \geq 0

Sufficient condition for optimality 2

If $f \in C^{2}$ on $R^{n}$ then

{\nabla f (x^{*}) = 0 \nabla^{2} f (x^{*}) > 0 ⟹ x^{*} is a strict local minimum of f on R^{n}

Necessary and sufficient condition for optimality 1

If $f \in C^{1}$ is convex on $R^{n}$ then

x^{*} is a global minimum of f on R^{n} ⟺ \nabla f (x^{*}) = 0

Optimality conditions (S sub R^n)

Necessary condition for optimality 1

Suppose that $S \subseteq R^{n}$ and that $f \in C^{1}$ on $S$ .

a) If $x^{*} \in S$ is a local minimum of $f$ over $S$ then

\nabla f (x^{*})^{T} p \geq 0

holds for all feasible directions $p$ at $x^{*}$ .

b) Suppose that $S$ is convex. If $x^{*}$ is a local minimum of $f$ over $S$ then

\nabla f (x^{*})^{T} (x - x^{*}) \geq 0, x \in S

Neecssary and sufficient conditions for optimality 1

Suppose that $S \subseteq R^{n}$ is a convex nonempty set and that $f \in C^{1}$ is a convex function on $S$ then

x^{*} is a global minimum of f over S ⟺ \nabla f (x^{*})^{T} (x - x^{*}) \geq 0, x \in S

When $S = R^{n}$ the expression can be reduced to $\nabla f (x^{*}) = 0$ , because then we don't need to worry about boundary points.

Stationary point in optimality condition

If $S$ is convex and $f \in C^{1}$ a point $x \in S$ fulfilling the four equivalent statements a)-d) are called a stationary point.

\nabla f (x^{*})^{T} (x - x^{*}) \geq 0, x \in S

x \in S min \nabla f (x^{*})^{T} (x - x^{*}) = 0

x^{*} = Proj_{S} [x^{*} - \nabla f (x^{*})]

- \nabla f (x^{*}) \in N_{S} (x^{*})

where $Proj_{S}$ is the projection onto the set $S$ , and $N_{s}$ is the normal cone.

Unconstrained optimization

Begin by finding a descent direction. The vector $p_{k}$ is a descent direction if $f (x_{k} + α p_{k}) < f (x_{k})$ for all $α \in [0, δ]$ for some $δ > 0$ .

Steepest descent direction

p_{k} = - \nabla f (x_{k})

Newton's search direction

\nabla^{2} f (x_{k}) p_{k} = - \nabla f (x_{k})

Lewenberg-Marquardt

[\nabla^{2} f (x_{k}) + λ I] p_{k} = - \nabla f (x_{k})

Then choose step length

Exact line search

min s . t . f (x_{k} + α p_{k}) α > 0

Newton's method

x_{k + 1} = x_{k} - \frac{φ ^{'} ( x _{k} )}{φ ^{''} ( x _{k} )}

Armijo rule

f (x_{k} + α p_{k}) \approx f (x_{k}) + α \nabla f (x_{k})^{T} p_{k}

then keep decreasing $α = α / 2$ until the following holds

f (x_{k} + α p_{k}) - f (x_{k}) \leq μ α \nabla f (x_{k})^{T} p_{k}

Stop the algorithm when at least two of the following holds

∣ ∣ \nabla f (x_{k}) ∣ ∣ f (x_{k - 1}) - f (x_{k}) ∣ ∣ x_{k - 1} - x_{k} ∣ ∣ \leq ϵ_{1} (1 + f (x_{k})) \leq ϵ_{2} (1 + f (x_{k})) \leq ϵ_{3} (1 + ∣ ∣ x_{k} ∣ ∣)

Different cone sets

Cone of feasible directions

R_{s} (x) = {p \in R^{n} ∣ \exists δ > 0, x + α p \in S, \forall α \in [0, δ]}

Tangent cone

T_{s} (x) = {p \in R^{n} ∣ \exists {x_{k}}_{k = 0}^{\infty} \subset S, {λ_{k}}_{k = 0}^{\infty} \subset (0, \infty), such that k \to \infty lim x_{k} = x, k \to \infty lim λ_{k} (x_{k} - x) = p}

Cone of descent directions

F^{o} (x) = {p \in R^{n} ∣ \nabla f (x)^{T} p < 0}

Active constraints

I (x) = {i \in 1, . . ., m ∣ g_{i} (x) = 0}

Inner gradient cone

G^{o} (x) = {p \in R^{n} ∣ \nabla g_{i} (x)^{T} p < 0, \forall i \in I (x)}

Gradient cone

G (x) = {p \in R^{n} ∣ \nabla g_{i} (x)^{T} p \leq 0, \forall i \in I (x)}

Nicely behaving set

G^{o} (x) \subseteq R_{s} (x) \subseteq T_{s} (x) \subseteq G (x)

Fritz John conditions

x^{*} local min ⟹ F^{o} (x) \cap G^{o} (x) = \emptyset

Constraint qualification (CQ)

Defines some regularity over the set

Abadie's CQ

Holds at a point $x \in S$ if $T_{s} (x) = G (x)$

Linear independence CQ (LICQ)

We say that LICQ holds at $x$ if the gradients $\nabla g_{i} (x), i \in I (x)$ for the active constraints are linearly independent.

Affine CQ

We say that the Affine CQ holds if all the constraints $g_{i}$ are affine.

Slater CQ

We say that Slater's CQ holds if all $g_{i}$ are convex and an inner point exists.

Karush-Kuhn-Tucker conditions (KKT)

Assume that Abadie's CQ holds at a point $x^{*}$ which is feasible in (P), then

x^{*} local min ⟹ ⎩ ⎪ ⎪ ⎨ ⎪ ⎪ ⎧ \nabla f (x^{*}) + \sum_{i = 1}^{m} μ_{i} \nabla g_{i} (x^{*}) μ_{i} g_{i} (x^{*}) μ_{i} = 0 = 0, i = 1, . . ., m \geq 0, i = 1, . . ., m

Sufficiency of KKT conditions

If the objective function $f$ is convex and all constraint functions $g_{i}$ are convex, the the following holds

x^{*} KKT point ⟹ x^{*} global optimum

Lagrangian duality/relaxation

A relaxation to

min s . t . f (x) x \in S

min s . t . f_{R} (x) x \in S_{R}

where $S \subseteq S_{R}$ and $f_{R} (x) \leq f (x), \forall x \in S$

Relaxation Theorem

a) b) c) f_{R}^{*} \leq f^{*} If the relaxed problem is infeasible, the original is as well If the relaxed problem has an optimal solution x_{R}^{*} for which it holds that x_{R}^{*} \in S and f_{R}^{*} (x_{R}^{*}) = f (x_{R}^{*}) then x_{R}^{*} is also optimal in the original problem.

Lagrangian relaxation

This is the primal problem

inf s.t. f (x) g_{i} (x) \leq 0, i = 1, . . ., m x \in X

and this is the dual problem

Lagrangian dual function

x \in X min L (x, μ) = q (μ) = inf s.t. [f (x + i = 1 \sum m μ_{i} g_{i} (x))] x \in X

Weak duality

For any $μ \geq 0$ and any feasible $x$ to the primal problem, it holds that

q (μ) \leq f (x)

Lagrangian dual problem

The dual function $q$ is concave and its effective domain is convex.

q^{*} = sup s.t. q (μ) μ \geq 0

Lagrange multiplier

$μ^{*}$ is a Lagrange multiplier if $q (μ^{*}) = f^{*}$

Strong duality

Assume that there exists a inner pint to the primal problem, and that $f^{*} \geq - in f$ , that $f$ is a convex function, that $g_{i}, q i = 1, . . ., m$ are convex functions and that $X$ is a convex set. Then the following holds

q^{*} = f^{*}

Linear programming (LP)

For linear problems of the sort

min s.t. c^{T} x x \in P

we can use the Simplex method to solve for an optimal solution. We first convert to standard form:

min s.t. c^{T} x A x = b x \geq 0

where $b \geq 0$ .

Basic solution

A point $x$ is a basic solution if $A x = b$ and the columns of A corresponding to non-zero elements of $x$ are linearly independent.

Basic feasible solution (BFS)

A point $x$ is a BFS if $x \geq 0$ , $A x = b$ and the columns of A corresponding to non-zero elements in $x$ are linearly independent.

Degenerate BFS

Consider a BFS with $x = [x_{B} x_{N}]^{T}$ . By definition $x_{N} = 0$ and $x_{B} = B^{- 1} b$ . If some elements of $x_{B}$ are zero the BFS is called degenerate.

Simplex method

Consider

min s.t. c^{T} x x \in P

Convert this problem to standard form. To find an initial BFS solve the Phase I problem. That is add as many artificial variables as you need to form the initial base vector as the identity matrix and consider minimizing the artificial variables. Use the Simplex method to move out all artificial variables from the base vector. When this is achieved we have solved the Phase I problem. If we can form the identity matrix without artificial variables go directly to the Phase II problem with this as the base. When we have an initial BFS we solve the Phase II problem, that is the original problem. For each iteration begin by determining $x_{B} = B^{- 1} b$ and $x_{N} = 0$ , e.g. we could have $x_{B} = [x_{1} x_{2}]^{T}$ and $x_{N} = [s_{1} s_{2}]^{T}$ . We also calculate $B$ and $N$ which corresponds to the same variables for the partition in $A$ , as well as $c_{B}$ and $c_{N}$ in a similar manner. Then we calculate the incoming variable (from $N$ ) with the following formula:

\tilde{c}_{N}^{T} = c_{N}^{T} - c_{B}^{T} B^{- 1} N

Then we need to calculate the outgoing variable (from $B$ ). We do this by

k : (B^{- 1} N_{j})_{k} > 0 arg min = \frac{( B ^{- 1} b ) _{k}}{( B ^{- 1} N _{j} ) _{k}}

We update the variables and go back and start over. We need to check that $B^{- 1} b \geq 0$ for feasible solution, that $B^{- 1} N_{j} \geq 0$ so that we do not have an unbounded solution. We terminate the algorithm when our cost vector $c_{N}^{T} \geq 0$ .

LP duality

This problem is called the primal (P)

z^{*} = inf s.t. c^{T} x A x = b x \geq 0

and this is the corresponding dual problem (D)

q^{*} = sup s.t. b^{T} y A^{T} y \leq c y \in R^{n}

The following relation holds for the primal and dual

	Primal		Dual
Objective	min		max
Variables	$\geq 0$ (canonical)	Constraints	$\leq$
	$\leq 0$ (non-canonical)		$\geq$
	free		$=$
Constraints	$\geq$ (canonical)	Variables	$\geq 0$
	$\leq$ (non-canonical)		$\leq 0$
	$=$		free

Weak duality theorem

If $x$ is a feasible point in (P) and $y$ is a feasible point in (D) then

c^{T} x \geq b^{T} y

Strong duality theorem

If both (P) and (D) are feasible then

There exists $x^{*}$ optimal in (P) and $y^{*}$ optimal in (D)
$c^{T} x^{*} = b^{T} y^{*}$ (meaning $z^{*} = q^{*}$ )

Possibilities in (P) and (D)

P\D	Finite optima	Unbounded	Infeasible
Finite optima	X
Unbounded			X
Infeasible		X	X

Perturbation

A perturbation is a small change to either $A, c$ or $b$ .

Subgradient method

For convex problems we can relax the differentiability assumption and use something like the subgradient method. Let $S \subseteq R^{n}$ be a nonempty convex set and let $f : S \to R$ be a convex function. The $p \in R^{n}$ is called a subgradient of $f$ at $\overset{ˉ}{x} \in S$ if

f (x) \geq f (\overset{ˉ}{x}) + p^{T} (x - \overset{ˉ}{x}), for any x \in S .

The set of all subgradients to $f$ at $\overset{ˉ}{x}$ is called the subdifferential of $f$ and is defined as

\partial f (\overset{ˉ}{x}) = {p \in R^{n} ∣ f (x) \geq f (\overset{ˉ}{x}) + p^{T} (x - \overset{ˉ}{x}), for all x \in S}

Integer Linear Programming (ILP)

Logical constraints

if $x$ then $y ⟹ x \leq y$
xor $⟹ x + y = 1$
or $⟹ x + y \geq 1$
exactly one $⟹ \sum_{i = 1}^{n} x_{i} = 1$
at least one $⟹ \sum_{i = 1}^{n} x_{i} \geq 1$

Disjoint feasible sets

0 \leq x \leq 1 \cup 5 x \geq 0 x \leq 8 x \leq 1 + 7 y x \geq 5 y y = {0, 1} \leq x \leq 8 ⟹

Feasible direction methods

These methods are just as local as unconstrained methods, but we find search direction in different ways and termination criteria is often based on KKT. For general sets it could be tricky to find search directions and step lengths. Three well known methods are the Frank-Wolfe method, Simplicial decomposition and the Gradient projection algorithm. These method apply on polyhedron.

Frank-Wolfe method

The idea of the Frank-Wolfe method is to calculate the direction of a linear approximation at a point $x$ , find the nearest extreme point with the simplex method and go in the direction of the extreme point.

Simplicial decomposition

It works like the Frank-Wolfe algorithm but it remembers previous extreme points and searches in the convex hull of these points for the next iterations.

Gradient projection algorithm

Based on the idea that this holds at a local min:

x^{*} = Proj_{X} [x^{*} - \nabla f (x^{*})]

For every iteration choose the next $x_{k + 1}$ as follows

x_{k + 1} = Proj_{X} [x_{k} - α_{k} f (x_{k})]

where $α_{k} > 0$ for some feasible direction.

Penalty methods

The idea of penalty methods is to transform a constrained problem to a unconstrained problem.

min s.t. f (x) x \in S ⟺ min s.t. f (x) + χ_{S} (x) x \in R^{n}

where $χ_{S} (x) = {0 + \infty if x \in S otherwise$

Exterior penalty method

Suppose that

x \in S minimize subject to f (x), g_{i} (x) \leq 0, i \in I h_{i} (x) = 0, i \in E x \in X

Choose a penalty method, e.g.

ψ_{1} (s) = ∣ s ∣ ψ_{2} (s) = s^{2}

Approximate the indicator function $χ_{s} (x)$

χ_{s} (x) \approx ν \overset{χ}{ˉ}_{s} (x) = ν [i = 1 \sum m ψ (max (0, g_{i} (x))) + j = 1 \sum l ψ (h_{j} (x))]

Interior penalty method

Suppose that

x \in S minimize subject to f (x), g_{i} (x) \leq 0, i \in I x \in X

Assume that $g_{i} (\overset{ˉ}{x}) < 0, i = 1, . . ., m, \exists \overset{ˉ}{x} \in R^{n}$ (an interior point should exist)

Choose a penalty method, e.g.

ϕ_{1} (s) ϕ_{2} (s) = - \frac{1}{s} = - log (min (1, - s))

Approximate the indicator function $χ_{s} (x)$

χ_{s} (x) \approx ν \overset{χ}{ˉ}_{s} (x) = {ν \sum_{i = 1}^{m} ϕ (g_{i} (x)) + \infty if g_{i} (x) < 0, i = 1, . . ., m otherwise