MINOS
From TomWiki
Contents |
Introduction
TOMLAB /MINOS (hereafter referred to as MINOS) is a linear and nonlinear programming system, designed to solve large-scale constrained optimization problems of the following form:
where the vectors b_{i} , c, d, l, u and the matrices A_{i} are constant, F (x) is a nonlinear function of some of the variables, and f (x) is a vector of nonlinear functions. The nonlinearities (if present) may be of a general nature but must be smooth and preferably "almost linear", in the sense that they should not change radically with small changes in the variables. We make the following definitions:
x | the nonlinear variables |
y | the linear variables |
(x, y) | the vector (x y)^{T} |
(1.1) | the objective function |
(1.2) | the nonlinear constraints |
(1.3) | the linear constraints |
(1.4) | the bounds on the variables |
m | the total number of general constraints in (2) and (3) |
n | the total number of variables in x and y |
m_{1} | the number of nonlinear constraints (the dimension of f (x)) |
n_{1} | the number of nonlinear variables (the dimension of x) |
n^{´}_{1} | the number of nonlinear objective variables (in F (x)) |
n^{´´}_{1} | the number of nonlinear Jacobian variables (in f (x)) |
A large-scale problem is one in which m and n are several hundred or several thousand. MINOS takes advantage of the fact that the constraint matrices Ai and the partial derivatives are typically sparse (contain many zeros).
The dimensions n^{´}_{1} and n^{´´}_{1} allow for the fact that F (x) and f (x) may involve different sets of nonlinear variables "x". The two sets of variables always overlap, in the sense that the shorter "x" is always the same as the beginning of the other. If x is the same in both cases, we have n_{1} = n^{´}_{l}= n^{´´}_{1}. Otherwise, we define the number of nonlinear variables to be n_{1} = max(n^{´}_{l} , n^{´´}_{l}).
In the following sections we introduce more terminology and give an overview of the MINOS optimization algorithms and the main system features.
Linear Programming
When F (x) and f (x) are absent, the problem becomes a linear program. Since there is no need to distinguish between linear and nonlinear variables, we use x rather than y. We also convert all general constraints into equalities with the aid of slack variables s, so that the only inequalities are simple bounds on the variables. Thus, we write linear programs in the form
When the constraints are linear, the bounds on the slacks are defined so that b = 0. When there are nonlinear constraints, some elements of b are nonzero.
In the mathematical programming world, x and s are sometimes called structural variables and logical variables. Their upper and lower bounds are fundamental to problem formulations and solution algorithms. Some of the components of l may be and those of u may be . If l_{j} = u_{j} , a variable is said to be fixed, and if its bounds are and , the variable is called free.
Within MINOS, a point (x, s) is said to be feasible if the following are true:
- The constraints Ax + s = b are satisfied to within machine precision ≈ 10^{-15}.
- The bounds l <= (x, s) <= u are satisfied to within a feasibility tolerance δ_{fea} ≈ 10^{-6}.
- The nonlinear constraints (32) are satisfied to within a row tolerance δ_{row} ≈ 10^{-6}.
Tolerances such as δ_{fea} and δ_{row} may be specified by setting Feasibility tolerance and Row tolerance.
MINOS solves linear programs using a reliable implementation of the primal simplex method, in which the constraints Ax + s = b are partitioned into the form
Bx_{B} + Nx_{N} = b,
where the basis matrix B is a square and nonsingular submatrix of ( A I ). The elements of x_{B} and x_{N} are called the basic and nonbasic variables respectively. Together, they are a permutation of the vector (x, s). Certain dual variables π and reduced costs d_{N} are defined by the equations
where (c_{B} , c_{N}) is a permutation of the objective vector (c, 0).
At a feasible point, nonbasic variables are typically equal to one of their bounds, and basic variables are somewhere between their bounds. To reach an optimal solution, the simplex method performs a sequence of iterations of the following general nature. With guidance from d_{N} , a nonbasic variable is chosen to move from its current value, and the basic variables are adjusted to satisfy the constraints. Usually one of the basic variables reaches a bound. The basis partition is then redefined with a column of B being replaced by a column of N . When no such interchange can be found to reduce the value of c^{T}x, the current solution is optimal.
The simplex method
For convenience, let x denote the variables (x, s). The main steps in a simplex iteration are as follows:
Compute dual variables: Solve B^{T}π = c_{B}.
Price: Compute some or all of the reduced costs d_{N} = c_{N} - N^{T}π to determine if a favorable nonbasic column a_{q} exists.
Compute search direction: Solve to determine the basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are p_{N} = 0, except for ±1 for the element corresponding to a_{q} .)
Find maximum steplength: Find the largest steplength amax such that x + α_{max}p continues to satisfy the bounds on the variables. The steplength may be determined by the new nonbasic variable reaching its opposite bound, but normally some basic variable will reach a bound first.
Update: Take the step α_{max} . If this was determined by a basic variable, interchange the corresponding column of B with column a_{q} from N .
When a starting basis is chosen and the basic variables x_{B} are first computed, if any components of x_{B} lie significantly outside their bounds, we say that the current point is infeasible. In this case, the simplex method uses a "Phase 1" procedure to reduce the sum of infeasibilities. This is similar to the subsequent "Phase 2" procedure just described.
The feasibility tolerance δ_{fea} is used to determine which Phase is in effect. A similar optimality tolerance δ_{opt} is used during pricing to judge whether any reduced costs are significantly large. (This tolerance is scaled by | | π | | , a measure of the size of the current π.)
If the solution procedures are interrupted, some of the nonbasic variables may lie strictly between their bounds:
l_{j} < x_{j} < u_{j}. In addition, at a "feasible" or "optimal" solution, some of the basic variables may lie slightly outside their bounds: . In rare cases, even a few nonbasic variables might lie outside their bounds by as much as δ_{fea}.
MINOS maintains a sparse LU factorization of the basis matrix B, using a Markowitz ordering scheme and Bartels-Golub updates, as implemented in the Fortran package LUSOL. The basis factorization is central to the efficient handling of sparse linear and nonlinear constraints.
Problems with a Nonlinear Objective
When nonlinearities are confined to the term F (x) in the objective function, the problem is a linearly constrained nonlinear program. MINOS solves such problems using a reduced-gradient method combined with a quasi-Newton method that generally leads to superlinear convergence. The implementation follows that described in Murtagh and Saunders.
As a slight generalization of the constraints Ax + s = b are partitioned into the form
Bx_{B} + Sx_{S} + Nx_{N} = b,
where x_{S} is a set of superbasic variables. As before, the nonbasic variables are normally equal to one of their bounds, while the basic and superbasic variables lie somewhere between their bounds (to within δ_{fea}). Let the number of superbasic variables be nS , the number of columns in S. At a solution, n_{S} will be no more than n_{1}, the number of nonlinear variables, and it is often much smaller than this. In many real-life cases we have found that n_{S} remains reasonably small, say 200 or less, regardless of the size of the problem. This is one reason why MINOS has proved to be a practical tool.
In the reduced-gradient method, x_{S} is regarded as a set of "independent variables" that are allowed to move in any desirable direction to reduce the objective function (or the sum of infeasibilities). The basic variables are then adjusted in order to continue satisfying the linear constraints. If it appears that no improvement can be made with the current definition of B, S and N , one of the nonbasic variables is selected to be added to S, and the process is repeated with an increased value of n_{S}. At all stages, if a basic or superbasic variable encounters one of its bounds, that variable is made nonbasic and the value of n_{S} is reduced by one.
For linear programs, we may interpret the simplex method as being the same as the reduced-gradient method, with the number of superbasic variables oscillating between 0 and 1. (In general, a step of the simplex method or the reduced-gradient method is called a minor iteration.)
A certain matrix Z is needed for descriptive purposes. It takes the form
though it is never computed explicitly. Given LU factors of the basis matrix B, it is possible to compute products of the form Z_{q} and Z^{T}g by solving linear equations involving B or B^{T}. This in turn allows optimization to be performed on the superbasic variables, while the basic variables are adjusted to satisfy the general linear constraints. (In the description below, the reduced-gradient vector satisfies d_{S} = Z^{T}g, and the search direction satisfies p = Zp_{S} .)
An important part of MINOS is the quasi-Newton method used to optimize the superbasic variables. This can achieve superlinear convergence during any sequence of iterations for which the B, S, N partition remains constant. It requires a dense upper-triangular matrix R of dimension n_{S} , which is updated in various ways to approximate the reduced Hessian :
where H is the Hessian of the objective function, i.e. the matrix of second derivatives of F (x). As for unconstrained optimization, the storage required for R is sometimes a limiting factor.
The reduced-gradient method
Let g be the gradient of the nonlinear objective. The main steps in a reduced-gradient iteration are as follows:
Compute dual variables and reduced gradient: Solve B^{T}π = g_{B} and compute the reduced-gradient vector d_{S} = g_{S} - S^{T}π.
Price: If ||d_{S}|| is sufficiently small, compute some or all of the reduced costs d_{N} = g_{N} - N^{T}π to determine if a favorable nonbasic column a_{q} exists. If so, move that column from N into S, expanding R accordingly.
Compute search direction: Solve R^{T}Rp_{S} = -d_{S} and Bp_{B} = -Sp_{S} to determine the superbasic and basic components of a search direction p along which the objective is improved. (The nonbasic elements of p are p_{N} = 0.)
Find maximum steplength: Find the largest steplength α_{max} such that x + α_{max} p continues to satisfy the bounds on the variables.
Perform linesearch: Find an approximate solution to the one-dimensional problem
Update (quasi-Newton): Take the step α. Apply a quasi-Newton update to R to account for this step.
Update (basis change): If a superbasic variable reached a bound, move it from S into N . If a basic variable reached a bound, find a suitable superbasic variable to move from S into B, and move the basic variable into N . Update R if necessary.
At an optimum, the reduced gradient d_{S} should be zero. MINOS terminates when ||d_{S}|| <= δ_{opt}||π|| and the reduced costs (component of dN ) are all sufficiently positive or negative, as judged by the same quantity delta_{opt}||π||.
In the linesearch, F (x + αp) really means the objective function evaluated at the point (x, y, s) + αp. This steplength procedure is another important part of MINOS. Two different procedures are used, depending on whether or not all gradients are known analytically. The number of nonlinear function evaluations required may be influenced by setting the Linesearch tolerance in the SPECS file.
Normally, the objective function F (x) will never be evaluated at a point x unless that point satisfies the linear constraints and the bounds on the variables. An exception is during a finite-difference check on the calculation of gradient elements. This check is performed at the starting point x_{0} , which takes default values or may be specified. MINOS ensures that the bounds l <= x_{0} <= u are satisfied, but in general the starting point will not satisfy the general linear constraints. If F (x_{0} ) is undefined, the gradient check should be suppressed (Verify level -1), or the starting point should be redefined.
Problems with Nonlinear Constraints
If any of the constraints are nonlinear, MINOS employs a projected Lagrangian algorithm, based on a method due to Robinson. This involves a sequence of major iterations, each of which requires the solution of a linearly constrained subproblem. Each subproblem contains linearized versions of the nonlinear constraints, as well as the original linear constraints and bounds.
At the start of the k-th major iteration, let (x_{k}, y_{k}) be an estimate of the variables, and let γ_{k} be an estimate of the Lagrange multipliers (dual variables) associated with the nonlinear constraints. The constraints are linearized by changing f (x) to its linear approximation:
or more briefly , where J (x_{k} ) is the Jacobian matrix evaluated at x_{k} . (The i-th row of the Jacobian is the gradient vector for the i-th nonlinear constraint function.) The subproblem to be solved during the k-th major iteration is then
where is the difference between f (x) and its linearization. The objective function is called an augmented Lagrangian. The scalar ρ_{k} is a penalty parameter, and the term involving ρ_{k} is a modified quadratic penalty function. MINOS uses the reduced-gradient method to solve each subproblem. As before, slack variables are introduced and the vectors b_{i} are incorporated into the bounds on the slacks. The linearized constraints take the form
We refer to this system as Ax + s = b as in the linear case. The Jacobian Jk is treated as a sparse matrix, the same as the matrices A1 , A2 and A3 . The quantities J_{k} , b, λ_{k} and ρ_{k} change each major iteration.
The projected Lagrangian method
For convenience, suppose that all variables and constraints are nonlinear. The main steps in a major iteration are as follows:
Solve subproblem: Find an approximate solution to the kth subproblem.
Compute search direction: Adjust the elements of if necessary (if they have the wrong sign).
Define a search direction .
Find steplength: Choose a steplength σ such that some merit function M(x,λ) has a suitable value at the point .
Update: Take the step σ to obtain . In some cases, adjust ρ_{k}.
For the first major iteration, the nonlinear constraints are ignored and minor iterations are performed until the original linear constraints are satisfied.
The initial Lagrange multiplier estimate is typically λ_{k} = 0 (though it can be provided by the user). If a subproblem terminates early, some elements of the new estimate may be changed to zero.
The penalty parameter initially takes a certain default value ρ_{k} = 100.0 / m_{1} , where m_{1} is the number of nonlinear constraints. (A value r times as big is obtained by specifying Penalty parameter r.) For later major iterations, ρ_{k} is reduced in stages when it appears that the sequence {x_{k},λ_{k}}is converging. In many cases it is safe to specify Penalty parameter 0.0 at the beginning, particularly if a problem is only mildly nonlinear. This may improve the overall efficiency.
In the output from MINOS, the term Feasible subproblem indicates that the linearized constraints have been satisfied. In general, the nonlinear constraints are satisfied only in the limit, so that feasibility and optimality occur at essentially the same time. The nonlinear constraint violation is printed every major iteration. Even if it is zero early on (say at the initial point), it may increase and perhaps fluctuate before tending to zero. On "well behaved" problems, the constraint violation will decrease quadratically (i.e., very quickly) during the final few major iterations.
For certain rare classes of problem it is safe to request the values λ_{k} = 0 and ρ_{k} = 0 for all subproblems by specifying Lagrangian = No (in which case the nonlinear constraint functions are evaluated only once per major iteration). However for general problems, convergence is much more likely with the default setting, Lagrangian = Yes.
The merit function
Unfortunately, it is not known how to define a merit function M(x,λ) that can be reduced at every major iteration. As a result, there is no guarantee that the projected Lagrangian method described above will converge from an arbitrary starting point. This has been the principal theoretical gap in MINOS, finally resolved by the PhD research of Michael Friedlander. The main features needed to stabilize MINOS are:
- To relax the linearized constraints via an penalty function.
- To repeat a major iteration with increased ρ_{k} (and more relaxed linearized constraints) if the nonlinear constraint violation would increase too much.
In practice, the method of MINOS 5.51 often does converge and a good rate of convergence is often achieved in the final major iterations, particularly if the constraint functions are "nearly linear". As a precaution, MINOS prevents radical changes from one major iteration to the next. Where possible, the steplength is chosen to be σ = 1, so that each new estimate of the solution is , the solution of the subproblem. If this point is "too different", a shorter steplength σ < 1 is chosen.
If the major iterations for a particular model do not appear to be converging, some of the following actions may help:
- Specify initial activity levels for the nonlinear variables as carefully as possible.
- Include sensible upper and lower bounds on all variables.
- Specify a Major damping parameter that is lower than the default value. This tends to make s smaller.
- Specify a Penalty parameter that is higher than the default value. This tends to prevent excessive departures from the constraint linearization.
Problem Formulation
In general, it is worthwhile expending considerable prior analysis to make the constraints completely linear if at all possible. Sometimes a simple transformation will suffice. For example, a pipeline optimization problem has pressure drop constraints of the form
where d_{i} are the design variables (pipe diameters) and the other terms are constant. These constraints are highly nonlinear, but by redefining the decision variables to be we can make the constraints linear. Even if the objective function becomes more nonlinear by such a transformation (and this usually happens), the advantages of having linear constraints greatly outweigh this.
Similarly, it is usually best not to move nonlinearities from the objective function into the constraints. For example, we should not replace
by
Scaling is a very important matter during problem formulation. A general rule is to scale both the data and the variables to be as close to 1.0 as possible. In general we suggest the range 1.0 to 10.0. When conflicts arise, one should sacrifice the objective function in favor of the constraints. Real-world problems tend to have a natural scaling within each constraint, as long as the variables are expressed in consistent physical units. Hence it is often sufficient to apply a scale factor to each row. MINOS has options to scale the rows and columns of the constraint matrix automatically. By default, only the linear rows and columns are scaled, and the procedure is reliable. If you request that the nonlinear constraints and variables be scaled, bear in mind that the scale factors are determined by the initial Jacobian J (x_{0}), which may differ considerably from J (x) at a solution.
Finally, upper and lower bounds on the variables (and on the constraints) are extremely useful for confining the region over which optimization has to be performed. If sensible values are known, they should always be used. They are also important for avoiding singularities in the nonlinear functions. Note that bounds may be violated slightly by as much as the feasibility tolerance δ_{fea} . Hence, if or log x_{2} appear (for example) and if δ_{fea} = 10^{-6}, the lower bound on x_{2} would normally have to be at least 10^{-5}. If it is known that x_{2} will be at least 0.5 (say) at a solution, then its lower bound should be 0.5.
For a detailed discussion of many aspects of numerical optimization, see Gill, Murray and Wright; in particular, see Chapter 8 for much invaluable advice on problem formulation and assessment of results.
Restrictions
MINOS is designed to find solutions that are locally optimal. The nonlinear functions in a problem must be smooth (i.e., their first derivatives must exist), especially near the desired solution. The functions need not be separable.
A certain "feasible" region is defined by the general constraints and the bounds on the variables. If the objective is convex within this region and if the feasible region itself is convex, any optimal solution obtained will be a global optimum. Otherwise there may be several local optima, and some of these may not be global. In such cases the chances of finding a global optimum are usually increased by choosing a starting point that is "sufficiently close", but there is no general procedure for determining what "close" means, or for verifying that a given local optimum is indeed global.
Integer restrictions cannot be imposed directly. If a variable x_{j} is required to be 0 or 1, a common ploy is to include a quadratic term x_{j} (1 - x_{j} ) in the objective function. MINOS will indeed terminate with x_{j} = 0 or 1, but inevitably the final solution will just be a local optimum. (Note that the quadratic is negative definite. MINOS will find a global minimum for quadratic functions that are positive definite or positive semidefinite, assuming the constraints are linear.)