# MINOS

## Introduction

TOMLAB /MINOS (hereafter referred to as MINOS) is a linear and nonlinear programming system, designed to solve large-scale constrained optimization problems of the following form:

$\begin{array}{cccc} \min\limits_{x, y} & & F(x) + c^{T} x + d^{T} y & \\ s/t & b_{1} & \le f(x) + A_{1} y \le & b_{2}, \\ & b_{3} & \le A_2x + A_{3} y \le & b_{4}, \\ & l & \le (x, y) \le & u, \\ \end{array}$

where the vectors bi , c, d, l, u and the matrices Ai are constant, F (x) is a nonlinear function of some of the variables, and f (x) is a vector of nonlinear functions. The nonlinearities (if present) may be of a general nature but must be smooth and preferably "almost linear", in the sense that they should not change radically with small changes in the variables. We make the following definitions:

 x the nonlinear variables y the linear variables (x, y) the vector (x y)T (1.1) the objective function (1.2) the nonlinear constraints (1.3) the linear constraints (1.4) the bounds on the variables m the total number of general constraints in (2) and (3) n the total number of variables in x and y m1 the number of nonlinear constraints (the dimension of f (x)) n1 the number of nonlinear variables (the dimension of x) n´1 the number of nonlinear objective variables (in F (x)) n´´1 the number of nonlinear Jacobian variables (in f (x))

A large-scale problem is one in which m and n are several hundred or several thousand. MINOS takes advantage of the fact that the constraint matrices Ai and the partial derivatives $\partial f_i(x) / \partial x_j$ are typically sparse (contain many zeros).

The dimensions n´1 and n´´1 allow for the fact that F (x) and f (x) may involve different sets of nonlinear variables "x". The two sets of variables always overlap, in the sense that the shorter "x" is always the same as the beginning of the other. If x is the same in both cases, we have n1 = n´l= n´´1. Otherwise, we define the number of nonlinear variables to be n1 = max(n´l , n´´l).

In the following sections we introduce more terminology and give an overview of the MINOS optimization algorithms and the main system features.

### Linear Programming

When F (x) and f (x) are absent, the problem becomes a linear program. Since there is no need to distinguish between linear and nonlinear variables, we use x rather than y. We also convert all general constraints into equalities with the aid of slack variables s, so that the only inequalities are simple bounds on the variables. Thus, we write linear programs in the form

$\min_{x,\ s}\ \ c^Tx \text{ subject to } Ax + s = b, \quad l \le (x,\ s) \le u.$

When the constraints are linear, the bounds on the slacks are defined so that b = 0. When there are nonlinear constraints, some elements of b are nonzero.

In the mathematical programming world, x and s are sometimes called structural variables and logical variables. Their upper and lower bounds are fundamental to problem formulations and solution algorithms. Some of the components of l may be $-\infty$ and those of u may be $+\infty$. If lj = uj , a variable is said to be fixed, and if its bounds are $-\infty$ and $+\infty$, the variable is called free.

Within MINOS, a point (x, s) is said to be feasible if the following are true:

• The constraints Ax + s = b are satisfied to within machine precision ≈ 10-15.
• The bounds l <= (x, s) <= u are satisfied to within a feasibility tolerance δfea ≈ 10-6.
• The nonlinear constraints (32) are satisfied to within a row tolerance δrow ≈ 10-6.

Tolerances such as δfea and δrow may be specified by setting Feasibility tolerance and Row tolerance.

MINOS solves linear programs using a reliable implementation of the primal simplex method, in which the constraints Ax + s = b are partitioned into the form

BxB + NxN = b,

where the basis matrix B is a square and nonsingular submatrix of ( A I ). The elements of xB and xN are called the basic and nonbasic variables respectively. Together, they are a permutation of the vector (x, s). Certain dual variables π and reduced costs dN are defined by the equations

$B^T\pi = c_B, \qquad d_N = c_N - N^T\pi,$

where (cB , cN) is a permutation of the objective vector (c, 0).

At a feasible point, nonbasic variables are typically equal to one of their bounds, and basic variables are somewhere between their bounds. To reach an optimal solution, the simplex method performs a sequence of iterations of the following general nature. With guidance from dN , a nonbasic variable is chosen to move from its current value, and the basic variables are adjusted to satisfy the constraints. Usually one of the basic variables reaches a bound. The basis partition is then redefined with a column of B being replaced by a column of N . When no such interchange can be found to reduce the value of cTx, the current solution is optimal.

#### The simplex method

For convenience, let x denote the variables (x, s). The main steps in a simplex iteration are as follows:

Compute dual variables: Solve BTπ = cB.

Price: Compute some or all of the reduced costs dN = cN - NTπ to determine if a favorable nonbasic column aq exists.

Compute search direction: Solve $Bp_B = \pm a_q$ to determine the basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are pN = 0, except for ±1 for the element corresponding to aq .)

Find maximum steplength: Find the largest steplength amax such that x + αmaxp continues to satisfy the bounds on the variables. The steplength may be determined by the new nonbasic variable reaching its opposite bound, but normally some basic variable will reach a bound first.

Update: Take the step αmax . If this was determined by a basic variable, interchange the corresponding column of B with column aq from N .

When a starting basis is chosen and the basic variables xB are first computed, if any components of xB lie significantly outside their bounds, we say that the current point is infeasible. In this case, the simplex method uses a "Phase 1" procedure to reduce the sum of infeasibilities. This is similar to the subsequent "Phase 2" procedure just described.

The feasibility tolerance δfea is used to determine which Phase is in effect. A similar optimality tolerance δopt is used during pricing to judge whether any reduced costs are significantly large. (This tolerance is scaled by | | π | | , a measure of the size of the current π.)

If the solution procedures are interrupted, some of the nonbasic variables may lie strictly between their bounds:

lj < xj < uj. In addition, at a "feasible" or "optimal" solution, some of the basic variables may lie slightly outside their bounds: $l_j - \delta_{fea} \le x_j \le u_j + \delta_{fea}$. In rare cases, even a few nonbasic variables might lie outside their bounds by as much as δfea.

MINOS maintains a sparse LU factorization of the basis matrix B, using a Markowitz ordering scheme and Bartels-Golub updates, as implemented in the Fortran package LUSOL. The basis factorization is central to the efficient handling of sparse linear and nonlinear constraints.

### Problems with a Nonlinear Objective

When nonlinearities are confined to the term F (x) in the objective function, the problem is a linearly constrained nonlinear program. MINOS solves such problems using a reduced-gradient method combined with a quasi-Newton method that generally leads to superlinear convergence. The implementation follows that described in Murtagh and Saunders.

As a slight generalization of the constraints Ax + s = b are partitioned into the form

BxB + SxS + NxN = b,

where xS is a set of superbasic variables. As before, the nonbasic variables are normally equal to one of their bounds, while the basic and superbasic variables lie somewhere between their bounds (to within δfea). Let the number of superbasic variables be nS , the number of columns in S. At a solution, nS will be no more than n1, the number of nonlinear variables, and it is often much smaller than this. In many real-life cases we have found that nS remains reasonably small, say 200 or less, regardless of the size of the problem. This is one reason why MINOS has proved to be a practical tool.

In the reduced-gradient method, xS is regarded as a set of "independent variables" that are allowed to move in any desirable direction to reduce the objective function (or the sum of infeasibilities). The basic variables are then adjusted in order to continue satisfying the linear constraints. If it appears that no improvement can be made with the current definition of B, S and N , one of the nonbasic variables is selected to be added to S, and the process is repeated with an increased value of nS. At all stages, if a basic or superbasic variable encounters one of its bounds, that variable is made nonbasic and the value of nS is reduced by one.

For linear programs, we may interpret the simplex method as being the same as the reduced-gradient method, with the number of superbasic variables oscillating between 0 and 1. (In general, a step of the simplex method or the reduced-gradient method is called a minor iteration.)

A certain matrix Z is needed for descriptive purposes. It takes the form

$Z = \left( {\begin{array}{ccc} -B^{-1}S \\ I \\ 0 \\ \end{array} } \right),$

though it is never computed explicitly. Given LU factors of the basis matrix B, it is possible to compute products of the form Zq and ZTg by solving linear equations involving B or BT. This in turn allows optimization to be performed on the superbasic variables, while the basic variables are adjusted to satisfy the general linear constraints. (In the description below, the reduced-gradient vector satisfies dS = ZTg, and the search direction satisfies p = ZpS .)

An important part of MINOS is the quasi-Newton method used to optimize the superbasic variables. This can achieve superlinear convergence during any sequence of iterations for which the B, S, N partition remains constant. It requires a dense upper-triangular matrix R of dimension nS , which is updated in various ways to approximate the reduced Hessian :

$R^TR \approx Z^T H Z,$

where H is the Hessian of the objective function, i.e. the matrix of second derivatives of F (x). As for unconstrained optimization, the storage required for R is sometimes a limiting factor.

Let g be the gradient of the nonlinear objective. The main steps in a reduced-gradient iteration are as follows:

Compute dual variables and reduced gradient: Solve BTπ = gB and compute the reduced-gradient vector dS = gS - STπ.

Price: If ||dS|| is sufficiently small, compute some or all of the reduced costs dN = gN - NTπ to determine if a favorable nonbasic column aq exists. If so, move that column from N into S, expanding R accordingly.

Compute search direction: Solve RTRpS = -dS and BpB = -SpS to determine the superbasic and basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are pN = 0.)

Find maximum steplength: Find the largest steplength αmax such that x + αmax p continues to satisfy the bounds on the variables.

Perform linesearch: Find an approximate solution to the one-dimensional problem

$\min_{\alpha}\ \ F(x+\alpha p) \text{ subject to } 0 \le \alpha \le \alpha_{max}.$

Update (quasi-Newton): Take the step α. Apply a quasi-Newton update to R to account for this step.

Update (basis change): If a superbasic variable reached a bound, move it from S into N . If a basic variable reached a bound, find a suitable superbasic variable to move from S into B, and move the basic variable into N . Update R if necessary.

At an optimum, the reduced gradient dS should be zero. MINOS terminates when ||dS|| <= δopt||π|| and the reduced costs (component of dN ) are all sufficiently positive or negative, as judged by the same quantity deltaopt||π||.

In the linesearch, F (x + αp) really means the objective function evaluated at the point (x, y, s) + αp. This steplength procedure is another important part of MINOS. Two different procedures are used, depending on whether or not all gradients are known analytically. The number of nonlinear function evaluations required may be influenced by setting the Linesearch tolerance in the SPECS file.

Normally, the objective function F (x) will never be evaluated at a point x unless that point satisfies the linear constraints and the bounds on the variables. An exception is during a finite-difference check on the calculation of gradient elements. This check is performed at the starting point x0 , which takes default values or may be specified. MINOS ensures that the bounds l <= x0 <= u are satisfied, but in general the starting point will not satisfy the general linear constraints. If F (x0 ) is undefined, the gradient check should be suppressed (Verify level -1), or the starting point should be redefined.

### Problems with Nonlinear Constraints

If any of the constraints are nonlinear, MINOS employs a projected Lagrangian algorithm, based on a method due to Robinson. This involves a sequence of major iterations, each of which requires the solution of a linearly constrained subproblem. Each subproblem contains linearized versions of the nonlinear constraints, as well as the original linear constraints and bounds.

At the start of the k-th major iteration, let (xk, yk) be an estimate of the variables, and let γk be an estimate of the Lagrange multipliers (dual variables) associated with the nonlinear constraints. The constraints are linearized by changing f (x) to its linear approximation:

$(x, x_k) = f(x_k) + J(x_k)(x - x_k)\,$

or more briefly $\bar{f} = f_k + J_k(x - x_k)$, where J (xk ) is the Jacobian matrix evaluated at xk . (The i-th row of the Jacobian is the gradient vector for the i-th nonlinear constraint function.) The subproblem to be solved during the k-th major iteration is then

$\begin{array}{cccc} \min_{x, y} && F(x) + c^T x + d^T y - \lambda^T_k f_d + \frac{1}{2} \rho_k \| f_d \|^2 \\ s/t & b_1 & \le \bar{f} + A_1y \le & b_2 \\ & b_3 & \le A_2x + A_3y \le & b_4, \\ & l & \le (x,y) \le & u, \end{array}$

where $f_d = f - \bar{f}$ is the difference between f (x) and its linearization. The objective function is called an augmented Lagrangian. The scalar ρk is a penalty parameter, and the term involving ρk is a modified quadratic penalty function. MINOS uses the reduced-gradient method to solve each subproblem. As before, slack variables are introduced and the vectors bi are incorporated into the bounds on the slacks. The linearized constraints take the form

$\left( {\begin{array}{cc} J_k & A_1 \\ A_2 & A_3 \\ \end{array} } \right) \left( {\begin{array}{c} x\\ y\\ \end{array} } \right) + \left( {\begin{array}{c} S_1\\ S_2\\ \end{array} } \right) = \left( {\begin{array}{c} J_k x_k - f_k\\ 0 \end{array} } \right).$

We refer to this system as Ax + s = b as in the linear case. The Jacobian Jk is treated as a sparse matrix, the same as the matrices A1 , A2 and A3 . The quantities Jk , b, λk and ρk change each major iteration.

#### The projected Lagrangian method

For convenience, suppose that all variables and constraints are nonlinear. The main steps in a major iteration are as follows:

Solve subproblem: Find an approximate solution $(\bar{x}, \bar{\lambda})$ to the kth subproblem.

Compute search direction: Adjust the elements of $\bar{\lambda}$ if necessary (if they have the wrong sign).

Define a search direction $(\Delta x,\ \Delta\lambda) = (\bar{x}- x_k, \bar{\lambda} - \lambda_k)$.

Find steplength: Choose a steplength σ such that some merit function M(x,λ) has a suitable value at the point $(x_k + \sigma\Delta x,\ \lambda_k + \sigma\Delta\lambda)$.

Update: Take the step σ to obtain $(x_{k+1},\ \lambda_{k+1})$. In some cases, adjust ρk.

For the first major iteration, the nonlinear constraints are ignored and minor iterations are performed until the original linear constraints are satisfied.

The initial Lagrange multiplier estimate is typically λk = 0 (though it can be provided by the user). If a subproblem terminates early, some elements of the new estimate $\bar{\lambda}$ may be changed to zero.

The penalty parameter initially takes a certain default value ρk = 100.0 / m1 , where m1 is the number of nonlinear constraints. (A value r times as big is obtained by specifying Penalty parameter r.) For later major iterations, ρk is reduced in stages when it appears that the sequence {xkk}is converging. In many cases it is safe to specify Penalty parameter 0.0 at the beginning, particularly if a problem is only mildly nonlinear. This may improve the overall efficiency.

In the output from MINOS, the term Feasible subproblem indicates that the linearized constraints have been satisfied. In general, the nonlinear constraints are satisfied only in the limit, so that feasibility and optimality occur at essentially the same time. The nonlinear constraint violation is printed every major iteration. Even if it is zero early on (say at the initial point), it may increase and perhaps fluctuate before tending to zero. On "well behaved" problems, the constraint violation will decrease quadratically (i.e., very quickly) during the final few major iterations.

For certain rare classes of problem it is safe to request the values λk = 0 and ρk = 0 for all subproblems by specifying Lagrangian = No (in which case the nonlinear constraint functions are evaluated only once per major iteration). However for general problems, convergence is much more likely with the default setting, Lagrangian = Yes.

#### The merit function

Unfortunately, it is not known how to define a merit function M(x,λ) that can be reduced at every major iteration. As a result, there is no guarantee that the projected Lagrangian method described above will converge from an arbitrary starting point. This has been the principal theoretical gap in MINOS, finally resolved by the PhD research of Michael Friedlander. The main features needed to stabilize MINOS are:

• To relax the linearized constraints via an $\ell_1$ penalty function.
• To repeat a major iteration with increased ρk (and more relaxed linearized constraints) if the nonlinear constraint violation would increase too much.

In practice, the method of MINOS 5.51 often does converge and a good rate of convergence is often achieved in the final major iterations, particularly if the constraint functions are "nearly linear". As a precaution, MINOS prevents radical changes from one major iteration to the next. Where possible, the steplength is chosen to be σ = 1, so that each new estimate of the solution is $(x_{k+1},\ \lambda_{k+1}) = (\bar{x}, \bar{\lambda})$, the solution of the subproblem. If this point is "too different", a shorter steplength σ < 1 is chosen.

If the major iterations for a particular model do not appear to be converging, some of the following actions may help:

1. Specify initial activity levels for the nonlinear variables as carefully as possible.
2. Include sensible upper and lower bounds on all variables.
3. Specify a Major damping parameter that is lower than the default value. This tends to make s smaller.
4. Specify a Penalty parameter that is higher than the default value. This tends to prevent excessive departures from the constraint linearization.

### Problem Formulation

In general, it is worthwhile expending considerable prior analysis to make the constraints completely linear if at all possible. Sometimes a simple transformation will suffice. For example, a pipeline optimization problem has pressure drop constraints of the form

${K_1 \over d_1^{4.814}} + {K_2 \over d_2^{4.814}} + \cdots\, \le P_T^2 - P_0^2,$

where di are the design variables (pipe diameters) and the other terms are constant. These constraints are highly nonlinear, but by redefining the decision variables to be $x_i = 1/d_i^{4.814}$ we can make the constraints linear. Even if the objective function becomes more nonlinear by such a transformation (and this usually happens), the advantages of having linear constraints greatly outweigh this.

Similarly, it is usually best not to move nonlinearities from the objective function into the constraints. For example, we should not replace

$\min\ F(x)$

by

$\min\ \ z \quad \hbox{subject to} \quad F(x) - z = 0.$

Scaling is a very important matter during problem formulation. A general rule is to scale both the data and the variables to be as close to 1.0 as possible. In general we suggest the range 1.0 to 10.0. When conflicts arise, one should sacrifice the objective function in favor of the constraints. Real-world problems tend to have a natural scaling within each constraint, as long as the variables are expressed in consistent physical units. Hence it is often sufficient to apply a scale factor to each row. MINOS has options to scale the rows and columns of the constraint matrix automatically. By default, only the linear rows and columns are scaled, and the procedure is reliable. If you request that the nonlinear constraints and variables be scaled, bear in mind that the scale factors are determined by the initial Jacobian J (x0), which may differ considerably from J (x) at a solution.

Finally, upper and lower bounds on the variables (and on the constraints) are extremely useful for confining the region over which optimization has to be performed. If sensible values are known, they should always be used. They are also important for avoiding singularities in the nonlinear functions. Note that bounds may be violated slightly by as much as the feasibility tolerance δfea . Hence, if $\sqrt{x_2}$ or log x2 appear (for example) and if δfea = 10-6, the lower bound on x2 would normally have to be at least 10-5. If it is known that x2 will be at least 0.5 (say) at a solution, then its lower bound should be 0.5.

For a detailed discussion of many aspects of numerical optimization, see Gill, Murray and Wright; in particular, see Chapter 8 for much invaluable advice on problem formulation and assessment of results.

### Restrictions

MINOS is designed to find solutions that are locally optimal. The nonlinear functions in a problem must be smooth (i.e., their first derivatives must exist), especially near the desired solution. The functions need not be separable.

A certain "feasible" region is defined by the general constraints and the bounds on the variables. If the objective is convex within this region and if the feasible region itself is convex, any optimal solution obtained will be a global optimum. Otherwise there may be several local optima, and some of these may not be global. In such cases the chances of finding a global optimum are usually increased by choosing a starting point that is "sufficiently close", but there is no general procedure for determining what "close" means, or for verifying that a given local optimum is indeed global.

Integer restrictions cannot be imposed directly. If a variable xj is required to be 0 or 1, a common ploy is to include a quadratic term xj (1 - xj ) in the objective function. MINOS will indeed terminate with xj = 0 or 1, but inevitably the final solution will just be a local optimum. (Note that the quadratic is negative definite. MINOS will find a global minimum for quadratic functions that are positive definite or positive semidefinite, assuming the constraints are linear.)