MINOS Solver Options

From TomWiki
Revision as of 05:33, 6 September 2011 by Elias (talk | contribs) (Created page with "The following sections describes some of the solver options depending on problem type. ==Options for Linear Programming== The following options apply specifically to linear pro...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The following sections describes some of the solver options depending on problem type.

Options for Linear Programming

The following options apply specifically to linear programs.

Crash option i Default = 3
Crash tolerance t Default = 0.1

Except on restarts, a Crash procedure is used to select an initial basis from certain rows and columns of the constraint matrix ( A I ). The Crash option i determines which rows and columns of A are eligible initially, and how many times Crash is called. Columns of I are used to pad the basis where necessary.

i = 0 The initial basis contains only slack variables: B = I .

  1. Crash is called once, looking for a triangular basis in all rows and columns of A.
  2. Crash is called twice (if there are nonlinear constraints). The first call looks for a triangular basis in linear rows, and the first major iteration proceeds with simplex iterations until the linear constraints are satisfied. The Jacobian is then evaluated for the second major iteration and Crash is called again to find a triangular basis in the nonlinear rows (retaining the current basis for linear rows).
  3. Crash is called up to three times (if there are nonlinear constraints). The first two calls treat linear equalities and linear inequalities separately. As before, the last call treats nonlinear rows at the start of the second major iteration.

If i >= 1, certain slacks on inequality rows are selected for the basis first. (If i >= 2, numerical values are used to exclude slacks that are close to a bound.) Crash then makes several passes through the columns of A, searching for a basis matrix that is essentially triangular. A column is assigned to "pivot" on a particular row if the column contains a suitably large element in a row that has not yet been assigned. (The pivot elements ultimately form the diagonals of the triangular basis.) For remaining unassigned rows, slack variables are inserted to complete the basis.

The Crash tolerance allows Crash to ignore certain "small" nonzeros in each column of A. If amax is the largest element in column j, other nonzeros aij in the column are ignored if . (To be meaningful, t should be in the range .)

When t > 0.0, the bases obtained by Crash may not be strictly triangular, but are likely to be nonsingular and almost triangular. The intention is to choose a basis containing more columns of A and fewer (arbitrary) slacks. A feasible solution may be reached sooner on some problems.

For example, suppose the first m columns of A are the matrix shown under LU factor tolerance; i.e., a tridiagonal matrix with entries -1, 4, -1. To help Crash choose all m columns for the initial basis, we would specify Crash tolerance t for some value of t > 1/4.

Options for All Problems

The following options have the same purpose for all problems, whether they linear or nonlinear.

Check frequency k Default = 60

Every k-th minor iteration after the most recent basis factorization, a numerical test is made to see if the current solution x satisfies the general linear constraints (including linearized nonlinear constraints, if any). The constraints are of the form Ax + s = b, where s is the set of slack variables. To perform the numerical test, the residual vector r = b - Ax - s is computed. If the largest component of r is judged to be too large, the current basis is refactorized and the basic variables are recomputed to satisfy the general constraints more accurately.

Check frequency 1 is useful for debugging purposes, but otherwise this option should not be needed.

Cycle limit l Default = 1
Cycle print p Default = 1
Cycle tolerance t Default = 0.0
Phantom columns c Default = 0
Phantom elements e Default = 0
Debug level l Default = 0

This causes various amounts of information to be output to the Print file. Most debug levels are not helpful to normal users, but they are listed here for completeness.

l = 0 No debug output.
l = 2 (or more) Output from m5setx showing the maximum residual after a row check.
l = 40 Output from lu8rpc (which updates the LU factors of the basis matrix), showing the position of the last nonzero in the transformed incoming column.
l = 50 Output from lu1mar (which computes the LU factors each refactorization), showing each pivot row and column and the dimensions of the dense matrix involved in the associated elimination.
l = 100 Output from m2bfac and m5log listing the basic and superbasic variables and their values at every iteration.
Expand frequency k Default = 10000

This option is part of an anti-cycling procedure designed to guarantee progress even on highly degenerate problems.

"Cycling" can occur only if zero steplengths are allowed. Here, the strategy is to force a positive step at every iteration, at the expense of violating the bounds on the variables by a small amount. Suppose that the Feasibility tolerance is d. Over a period of k iterations, the tolerance actually used by MINOS increases from 0.5δ to δ (in steps of 0.5δ/k).

Every k iterations, or when feasibility and optimality are first detected, a resetting procedure eliminates any infeasible nonbasic variables. Some additional iterations may be needed to restore feasibility and optimality. Increasing k reduces that likelihood, but it gives less freedom to choose large pivot elements during basis changes. (See Pivot tolerance.)

Factorization frequency k Default = 100 (LP) or 50 (NLP)

With linear programs, most iterations cause a basis change, in which one column of the basis matrix B is replaced by another. The LU factors of B must be updated accordingly. At most k updates are performed before the current B is factorized directly.

Each update tends to add nonzeros to the LU factors. Since the updating method is stable, k mainly affects the efficiency of minor iterations, rather than stability.

High values of k (such as 100 or 200) may be more efficient on "dense" problems, when the factors of B tend to have two or three times as many nonzeros as B itself. Lower values of k may be more efficient on problems that are very sparse.

Feasibility tolerance t Default = 1.0e-6

This sets the feasibility tolerance δfea = t (see #Options for Linear Programming). A variable or constraint is considered feasible if it does not lie outside its bounds by more than δfea.

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible. Let sinf be the corresponding sum of infeasibilities. If sinf is quite small, it may be appropriate to raise t by a factor of 10 or 100. Otherwise, some error in the data should be suspected. If sinf is not small, there may be other points that have a significantly smaller sum of infeasibilities. MINOS does not attempt to find a solution that minimizes the sum.

For Scale option 1 or 2, feasibility is defined in terms of the scaled problem (since it is then more likely to be meaningful). The final unscaled solution can therefore be infeasible by an unpredictable amount, depending on the size of the scale factors. Precautions are taken so that in a "feasible solution" the original variables will never be infeasible by more than 0.1. Values that large are very unlikely.

Iterations limit k Default = 3m

MINOS stops after k iterations even if the simplex method has not yet reached a solution. If k = 0, no iterations are performed, but the starting point is tested for both feasibility and optimality.

LU factor tolerance t1 Default = 100.0 (LP) or 5.0 (NLP)
LU update tolerance t2 Default = 10.0 (LP) or 5.0 (NLP)

These tolerances affect the stability and sparsity of the basis factorization B = LU during refactorization and updating, respectively. They must satisfy t1 , t2 >= 1.0. The matrix L is a product of matrices of the form

Failed to parse (unknown function "\begin{array}"): {\displaystyle \left(\begin{array}{cc} {1 \\ \mu & \ 1} \end{array}\right). }

where the multipliers µ satisfy |µ| = ti . Values near 1.0 favor stability, while larger values favor sparsity. The default values usually strike a good compromise. For large and relatively dense problems, t1 = 10.0 or 5.0 (say) may give a useful improvement in stability without impairing sparsity to a serious degree.

For certain very regular structures (e.g., band matrices) it may be necessary to reduce t1 and/or t2 in order to achieve stability. For example, if the columns of A include a submatrix of the form

Failed to parse (unknown function "\begin{array}"): {\displaystyle \left(\begin{array}{ccccc}{4 & -1\\ -1 & 4 & -1 \\ & \ddots& \ddots& \ddots\\ & & -1 & 4 & -1 \\ & & & -1 & 4} \end{array}\right), }

one should set both and to values in the range

LU density tolerance t3 Default = 0.5
LU singularity tolerance t4 Default = E0.673.25e-11

The density tolerance t3 is used during LU factorization of the basis matrix. Columns of L and rows of U are formed one at a time, and the remaining rows and columns of the basis are altered appropriately. At any stage, if the density of the remaining matrix exceeds t3, the Markowitz strategy for choosing pivots is altered to reduce the time spent searching for each remaining pivot. Raising the density tolerance towards 1.0 may give slightly sparser LU factors, with a slight increase in factorization time.

The singularity tolerance t4 helps guard against ill-conditioned basis matrices. When the basis is refactorized, the diagonal elements of U are tested as follows: if or , the j-th column of the basis is replaced by the corresponding slack variable. (This is most likely to occur after a restart, or at the start of a major iteration.)

In some cases, the Jacobian matrix may converge to values that make the basis exactly singular. (For example, a whole row of the Jacobian could be zero at an optimal solution.) Before exact singularity occurs, the basis could become very ill-conditioned and the optimization could progress very slowly (if at all). Setting t4 = 1.0e-5, say, may help cause a judicious change of basis.

Maximize Default
Minimize Default

This specifies the required direction of optimization.

Multiple price k Default = 1

It is not normal to set k > 1 for linear programs, as it causes MINOS to use the reduced-gradient method rather than the simplex method. The number of iterations, and the total work, are likely to increase.

The reduced-gradient iterations do not correspond to the very efficient multiple pricing "minor iterations" carried out by certain commercial linear programming systems. Such systems require storage for k dense vectors of dimension m, so that k is usually limited to 5 or 6. In MINOS, the total storage requirements increase only slightly with k. (The Superbasics limit must be at least k.)

Optimality tolerance||t||Default = 1.0e-6

This is used to judge the size of the reduced gradients Failed to parse (unknown function "\T"): {\displaystyle d_j = g_j - \pi\T a_j} , where gj is the gradient of the objective function corresponding to the j-th variable, aj is the associated column of the constraint matrix (or Jacobian), and Πis the set of dual variables.

By construction, the reduced gradients for basic variables are always zero. Optimality is declared if the reduced gradients for nonbasic variables at their lower or upper bounds satisfy Failed to parse (unknown function "\norm"): {\displaystyle d_j / \norm{\pi} \ge -t} or Failed to parse (unknown function "\norm"): {\displaystyle d_j / \norm{\pi} \le t} respectively, and if Failed to parse (unknown function "\norm"): {\displaystyle |d_j| / \norm{\pi} \le t} for superbasic variables.

In those tests, is a measure of the size of the dual variables. It is included to make the tests independent of a large scale factor on the objective function. The quantity actually used is defined by

so that only scale factors larger than 1.0 are allowed for. If the objective is scaled down to be very small, the optimality test reduces to comparing dj against t.

Partial price p Default = 10 (LP) or 1 (NLP)

This parameter is recommended for large problems that have significantly more variables than constraints. It reduces the work required for each "pricing" operation (when a nonbasic variable is selected to become basic or superbasic).

When p = 1, all columns of the constraint matrix ( A I ) are searched. Otherwise, A and I are partitioned to give p roughly equal segments Aj , Ij (j = 1 to p). If the previous pricing search was successful on Aj , Ij, the next search begins on the segments Aj+1 , Ij+1 . (Subscripts are modulo p.)

If a reduced gradient is found that is larger than some dynamic tolerance, the variable with the largest such reduced gradient (of appropriate sign) is selected to become superbasic. (Several may be selected if multiple pricing has been specified.) If nothing is found, the search continues on the next segments Aj+2 , Ij+2 , and so on.

Partial price t (or t/2 or t/3) may be appropriate for time-stage models having t time periods.

Pivot tolerance t Default = E2/3 ≈ 10-11

Broadly speaking, the pivot tolerance is used to prevent columns entering the basis if they would cause the basis to become almost singular. When x changes to x + αp for some search direction p, a "ratio test" is used to determine which component of x reaches an upper or lower bound first. The corresponding element of p is called the pivot element.

For linear problems, elements of p are ignored (and therefore cannot be pivot elements) if they are smaller than the pivot tolerance t. For nonlinear problems, elements smaller than Failed to parse (unknown function "\norm"): {\displaystyle t\norm{p}} are ignored.

It is common for two or more variables to reach a bound at essentially the same time. In such cases, the Feasibility tolerance provides some freedom to maximize the pivot element and thereby improve numerical stability. Excessively small Feasibility tolerances should therefore not be specified.

To a lesser extent, the Expand frequency also provides some freedom to maximize the pivot element. Excessively large Expand frequencies should therefore not be specified.

Scale option l Default = 2 (LP) or 1 (NLP)
Scale Yes
Scale No
Scale linear variables Same as Scale option 1
Scale nonlinear variables Same as Scale option 2
Scale all variables Same as Scale option 2
Scale tolerance t Default = 0.9
Scale, Print
Scale, Print, Tolerance t

Three scale options are available as follows:

l = 0 No scaling. If storage is at a premium, this option saves m + n words of workspace.
l = 1 possible to 1.0. This sometimes improves the performance of the solution procedures.
l = 2 helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.

Scale Yes sets the default scaling. (Caution : If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

If nonlinear constraints are present, Scale option 1 or 0 should generally be tried at first. Scale option 2 gives scales that depend on the initial Jacobian, and should therefore be used only if (a) a good starting point is provided, and (b) the problem is not highly nonlinear.

Scale, Print causes the row-scales r(i) and column-scales c(j) to be printed. The scaled matrix coefficients are , and the scaled bounds on the variables and slacks are , where if .

All forms except Scale option may specify a tolerance t, where 0 < t < 1 (for example: Scale, Print, Tolerance = 0.99). This affects how many passes might be needed through the constraint matrix. On each pass, the scaling procedure computes the ratio of the largest and smallest nonzero coefficients in each column:

If is less than t times its previous value, another scaling pass is performed to adjust the row and column scales. Raising t from 0.9 to 0.99 (say) usually increases the number of scaling passes through A. At most 10 passes are made.

If a Scale option has not already been specified, Scale, Print or Scale tolerance both set the default scaling.

Weight on linear objective w Default = 0.0

This keyword invokes the so-called composite objective technique, if the first solution obtained is infeasible, and if the objective function contains linear terms. While trying to reduce the sum of infeasibilities, the method also attempts to optimize the linear objective. At each infeasible iteration, the objective function is defined to be

where σ = 1 for minimization, σ = -1 for maximization, and c is the linear objective. If an "optimal" solution is reached while still infeasible, w is reduced by a factor of 10. This helps to allow for the possibility that the initial w is too large. It also provides dynamic allowance for the fact that the sum of infeasibilities is tending towards zero.

The effect of w is disabled after 5 such reductions, or if a feasible solution is obtained.

The Weight option is intended mainly for linear programs. It is unlikely to be helpful on nonlinear problems.

Options for Nonlinear Objectives

The following options apply to nonlinear programs whose constraints are linear.

Crash option l Default = 3
Crash tolerance t Default = 0.1

These options are the same as for linear programs.

Options for All Nonlinear problems

Expand frequency k Default = 10000

This option is used the same as for linear programs, but takes effect only when there is just one superbasic variable. (Cycling can occur only when the current solution is at a vertex of the feasible region. Thus, zero steps are allowed if there is more than one superbasic variable, but otherwise positive steps are enforced.) Increasing k helps reduce the number of slightly infeasible nonbasic basic variables (most of which are eliminated during a resetting procedure). However, it also diminishes the freedom to choose a large pivot element (see Pivot tolerance).

Feasibility tolerance t Default = 1.0e-6

When the constraints are linear, a feasible solution is one in which all variables, including slacks, satisfy their upper and lower bounds to within the absolute tolerance t. (Since slacks are included, this means that the general linear constraints are also satisfied to within t.)

When nonlinear constraints are present, a feasible subproblem is one in which the linear constraints and bounds, as well as the current linearization of the nonlinear constraints, are satisfied to within the tolerance t.

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible.

Normally, the nonlinear functions F (x) and f (x) are evaluated only at points x that satisfy the linear constraints and bounds. If the functions are undefined in certain regions, every attempt should be made to eliminate these regions from the problem. For example, for a function , it would be essential to place lower bounds on both variables. If Feasibility tolerance = 10-6, the bounds x1 = 10-5 and x2 = 10-4 might be appropriate. (The log singularity is more serious; in general, keep variables as far away from singularities as possible.)

An exception is during optional gradient checking (see Verify), which occurs before any optimization takes place. The points at which the functions are evaluated satisfy the bounds but not necessarily the general constraints. If this causes difficulty, gradient checking should be suppressed by setting Verify level -1.

If a subproblem is infeasible, the bounds on the linearized constraints are relaxed in several stages until the subproblem appears feasible. (The true bounds are restored for the next subproblem.) This approach sometimes allows the optimization to proceed successfully. In general, infeasible subproblems are a symptom of difficulty and it may be necessary to increase the Penalty parameter or alter the starting point.

Note : Feasibility with respect to the nonlinear constraints is measured against the Row tolerance, not the Feasibility tolerance.

Hessian dimension r Default = 50

This specifies that an r × r triangular matrix R is to be available for use by the quasi-Newton algorithm (to approximate the reduced Hessian matrix according to . Suppose there are s superbasic variables at a particular iteration. Whenever possible, r should be greater than s.

If r >= s, the first s columns of R are used to approximate the reduced Hessian in the normal manner. If there are no further changes to the set of superbasic variables, the rate of convergence is usually superlinear. If r < s, a matrix of the form

Failed to parse (unknown function "\begin{array}"): {\displaystyle \left(\begin{array}{cc} {R_r & 0 \\ & D} \end{array}\right) }

is used to approximate the reduced Hessian, where Rr is an r × r upper triangular matrix and D is a diagonal matrix of order s - r. The rate of convergence is no longer superlinear (and may be arbitrarily slow).

The storage required is of order , which is substantial if r is as large as 1000 (say). In general, r should be a slight over-estimate of the final number of superbasic variables, whenever storage permits. It need not be larger than n1 + 1, where n1 is the number of nonlinear variables. For many problems it can be much smaller than n1 .

Iterations limit k Default = 3m + 10n1

If the constraints are linear, this is the maximum number of iterations allowed for the simplex method or the reduced-gradient method. Otherwise, it is the maximum number of minor iterations, summed over all major iterations.

If k= 0, no minor iterations are performed, but the starting point is tested for both feasibility and optimality.


Linesearch tolerance t Default = 0.1

For nonlinear problems, this controls the accuracy with which a steplength a is located during one-dimensional searches of the form

A linesearch occurs on most minor iterations for which x is feasible. (If the constraints are nonlinear, the function being minimized is the augmented Lagrangian in equation.

The value of t must satisfy . The default value t = 0.1 requests a moderately accurate search, and should be satisfactory in most cases. If the nonlinear functions are cheap to evaluate, a more accurate search may be appropriate; try t = 0.01 or t = 0.001. The number of iterations should decrease, and this will reduce total run time if there are many linear or nonlinear constraints. If the nonlinear functions are expensive to evaluate, a less accurate search may be appropriate; try t = 0.5 or perhaps t = 0.9. (The number of iterations will probably increase, but the total number of function evaluations may decrease enough to compensate.)

LU singularity tolerance t3 Default = ε0.67 ≈ 3.25e - 11
LU swap tolerance t4 Default = ε1/4 ≈ 10-4

The singularity tolerance t3 helps guard against ill-conditioned basis matrices. When the basis is refactorized, the diagonal elements of U are tested as follows: if or , the j-th column of the basis is replaced by the corresponding slack variable. (This is most likely to occur after a restart, or at the start of a major iteration.)

In some cases, the Jacobian matrix may converge to values that make the basis exactly singular. (For example, a whole row of the Jacobian could be zero at an optimal solution.) Before exact singularity occurs, the basis could become very ill-conditioned and the optimization could progress very slowly (if at all). Setting Failed to parse (syntax error): {\displaystyle t_3 = \hbox{\tt 1.0e-5}} , say, may help cause a judicious change of basis.

The LU swap tolerance is somewhat similar but can take effect more easily. It is again used only after a basis factorization, and normally just at the start of a major iteration. If a diagonal of U seems to be rather small (as measured by t4 ) relative to the biggest diagonal of U , a basis change is made in which the basic variable associated with the small diagonal of U is swapped with a carefully chosen superbasic variable (if there are any). The number of superbasic variables stays the same. A message is printed to advise that a swap has occurred.

In practice this tends to help problems whose basis is becoming ill-conditioned. If the number of swaps becomes excessive, set LU swap tolerance 1.0e-6, say, or perhaps smaller.

Minor damping parameter d3 Default = 2.0

This parameter limits the change in x during a linesearch. It applies to all nonlinear problems, once a "feasible solution" or "feasible subproblem" has been found.

A linesearch of the form is performed over the range , where is the step to the nearest upper or lower bound on x. Normally, the first steplength tried is , but in some cases, such as or , even a moderate change in the components of x can lead to floating-point overflow.

The parameter d is therefore used to define a limit Failed to parse (unknown function "\norm"): {\displaystyle \bar{\beta} = d(1 + \norm{x}) / \norm{p}} , and the first evaluation of F (x) is at the potentially smaller steplength .

Wherever possible, upper and lower bounds on x should be used to prevent evaluation of nonlinear functions at meaningless points. The Minor damping parameter provides an additional safeguard. The default value d = 2.0 should not affect progress on well behaved problems, but setting d = 0.1 or 0.01 may be helpful when rapidly varying functions are present. A "good" starting point may be required. An important application is to the class of nonlinear least-squares problems.

In cases where several local optima exist, specifying a small value for d may help locate an optimum near the starting point.

Multiple price k Default = 1

"Pricing" refers to a scan of the current nonbasic variables to determine if any should be changed from their current value (by allowing them to become superbasic or basic).

If multiple pricing is in effect, the k best nonbasic variables are selected for admission to the superbasic set. ("Best" means the variables with largest reduced gradients of appropriate sign.) If partial pricing is also in effect, the k best variables are selected from the current partition of A and I .

On large nonlinear problems it may help to set k > 1 if there are not many superbasic variables at the starting point but many at the optimal solution.

Optimality tolerance t Default = 1.0e-6
Partial price p Default = 10 (LP) or 1 (NLP)

This parameter may be useful for large problems that have significantly more variables than constraints. Larger values reduce the work required for each "pricing" operation (when a nonbasic variable is selected to become basic or superbasic).

Scale option l Default = 2 (LP) or 1 (NLP)
Scale Yes
Scale No
Scale linear variables Same as Scale option 1
Scale nonlinear variables Same as Scale Option 2
Scale all variables Same as Scale Option 2
Scale tolerance t Default = 0.9
Scale, Print
Scale, Print, Tolerance t

Three scale options are available as follows:

l = 0 No scaling. If storage is at a premium, this option saves m + n words of workspace.
l = 1 If some of the variables are linear, the constraints and linear variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0.
l = 2 The constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.

Scale option 1 is the default for nonlinear problems. (Only linear variables are scaled.)

Scale Yes sets the default. (Caution : If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

The Scale tolerance and Scale, Print options are the same as for linear programs.

Subspace tolerance t Default = 0.5

This controls the extent to which optimization is confined to the current set of basic and superbasic variables (Phase 4 iterations), before one or more nonbasic variables are added to the superbasic set (Phase 3). The value specified must satisfy .

When a nonbasic variable xj is made superbasic, the norm of the reduced-gradient vector (for all superbasics) is recorded. Let this be Failed to parse (unknown function "\norm"): {\displaystyle \norm{Z^T g_0}} . (In fact, the norm is <math<|d_j|</math>, the size of the reduced gradient for the new superbasic variable xj.)

Subsequent Phase 4 iterations continue at least until the norm of the reduced-gradient vector satisfies Failed to parse (unknown function "\norm"): {\displaystyle \norm{Z^T g} \le t \times \norm{Z^T g_0}} . (Failed to parse (unknown function "\norm"): {\displaystyle \norm{Z^T g}} is the size of the largest reduced-gradient among the superbasic variables.)

A smaller value of t is likely to increase the total number of iterations, but may reduce the number of basis changes. A larger value such as t = 0.9 may sometimes lead to improved overall efficiency, if the number of superbasic variables is substantially larger at the optimal solution than at the starting point.

Other convergence tests on the change in the function being minimized and the change in the variables may prolong Phase 4 iterations. This helps to make the overall performance insensitive to larger values of t.

Superbasics limit s Default = 50

This places a limit on the storage allocated for superbasic variables. Ideally, s should be set slightly larger than the "number of degrees of freedom" expected at an optimal solution.

For nonlinear problems, the number of degrees of freedom is often called the "number of independent variables". Normally, s need not be greater than n1 + 1, where n1 is the number of nonlinear variables. For many problems, s may be considerably smaller than n1 . This saves storage if n1 is very large.

If Hessian dimension r is specified, the default value of s is the same number (and conversely). This is a safeguard to ensure superlinear convergence wherever possible. Otherwise, the default for both r and s is 50.

Unbounded objective value r1 Default = 1.0e+20
Unbounded step size r2 Default = 1.0e+10

These parameters are intended to detect unboundedness in nonlinear problems. During a linesearch of the form mina F (x + αp), if |F | exceeds r1 or if a exceeds r2, iterations are terminated with the exit message Problem is unbounded (or badly scaled).

If singularities are present, unboundedness in F (x) may be manifested by a floating-point overflow (during the evaluation of F (x + αp)), before the test against r1 can be made.

Unboundedness in x is best avoided by placing finite upper and lower bounds on the variables. See also the Minor damping parameter.

Verify level rl Default = 0
Verify objective gradients Same as Verify level 1
Verify constraint gradients Same as Verify level 2
Verify Same as Verify level 3
Verify gradients Same as Verify level 3
Verify Yes Same as Verify level 3
Verify No Same as Verify level 0

These options refer to a check on the gradients computed by your nonlinear function routines funobj and funcon at the starting point (the initial value of the nonlinear variables x(*)). Values output in the gradient array g(*) are compared with estimates obtained by finite differences.

l = 0 Only a "cheap" test is performed, requiring three evaluations of the nonlinear objective (if any) and two evaluations of the nonlinear constraints.
l = 1 A more reliable check is made on each component of the objective gradient.
l = 2 A check is made on each column of the Jacobian matrix associated with the nonlinear constraints.
l = 3 A detailed check is made on both the objective and the Jacobian.
l = -1 No checking is performed. This may be necessary if the functions are undefined at the starting point.

Verify level 3 is recommended for a new function routine, particularly if the "cheap" test indicates error. Missing gradients are not checked (so there is no overhead). If there are many nonlinear variables, the Start and Stop keywords may be used to limit the check to a subset.

As noted, gradient verification occurs at the starting point, before a problem is scaled, and before the first basis is factorized. The bounds on x will be satisfied, but the general linear constraints may not. If the nonlinear objective or constraint functions are undefined, you could initially specify

Objective = NONE
Nonlinear objective  variables	 0
Major  iterations		 1
Verify level			-1

to obtain a point that satisfies the linear constraints, and then restart with the correct linear and nonlinear objective, along with

Verify level			 3

Options for Nonlinear Constraints

The following options apply to problems with nonlinear constraints.

Completion Partial Default
Completion Full

When there are nonlinear constraints, this determines whether subproblems should be solved to moderate accuracy (partial completion), or to full accuracy (full completion). MINOS implements the option by using two sets of convergence tolerances for the subproblems.

Use of partial completion may reduce the work during early major iterations, unless the Minor iterations limit is active. The optimal set of basic and superbasic variables will probably be determined for any given subproblem, but the reduced gradient may be larger than it would have been with full completion.

An automatic switch to full completion occurs when it appears that the sequence of major iterations is converging. The switch is made when the nonlinear constraint error is reduced below 100*(Row tolerance), the relative change in 'δk is 0.1 or less, and the previous subproblem was solved to optimality.

Full completion tends to give better Lagrange-multiplier estimates. It may lead to fewer major iterations, but may result in more minor iterations.


Crash option l Default = 3
Crash tolerance t Default = 0.1

Let A refer to the linearized constraint matrix.

l = 0 The initial basis contains only slack variables: B = I .
l = 1 A is evaluated at the starting point. Crash is called once, looking for a triangular basis in all rows and columns of A.
l = 2 A is evaluated only after the linear constraints are satisfied. Crash is called twice. The first call looks for a triangular basis in linear rows. The first major iteration proceeds with simplex-type iterations until the linear constraints are satisfied. A is then evaluated for the second major iteration and Crash is called again to find a triangular basis in the nonlinear rows (retaining the current basis for linear rows).
l = 3 Crash is called three times, treating linear equalities and linear inequalities separately, with simplex- type iterations in between. As before, the last call treats nonlinear rows at the start of the second major iteration.

Feasibility tolerance||t||Default = 1.0e-6

A "feasible subproblem" is one in which the linear constraints and bounds, as well as the current linearization of the nonlinear constraints, are satisfied to within t.

Note that feasibility with respect to the nonlinear constraints is determined by the Row tolerance (not the Feasibility tolerance).

MINOS first attempts to satisfy the linear constraints and bounds. If the sum of infeasibilities cannot be reduced to zero, the problem is declared infeasible.

If Scale option = 1 or 2, feasibility is defined in terms of the scaled problem (since it is then more likely to be meaningful).

Normally, the nonlinear functions F (x) and f (x) are evaluated only at points x that satisfy the linear constraints and bounds. If the functions are undefined in certain regions, every attempt should be made to eliminate these regions from the problem. For example, for a function , it would be essential to place lower bounds on both variables. If Feasibility tolerance = 10-6 , the bounds and ' might be appropriate. (The log singularity is more serious; in general, keep variables as far away from singularities as possible.)

An exception is during optional gradient checking (see Verify), which occurs before any optimization takes place. The points at which the functions are evaluated satisfy the bounds but not necessarily the general constraints. If this causes difficulty, gradient checking should be suppressed by setting Verify level -1.

If a subproblem is infeasible, the bounds on the linearized constraints are relaxed in several stages until the subproblem appears feasible. (The true bounds are restored for the next subproblem.) This approach sometimes allows the optimization to proceed successfully. In general, infeasible subproblems are a symptom of difficulty and it may be necessary to increase the Penalty parameter or alter the starting point.

Lagrangian Yes Default
Lagrangian No

This determines the form of the objective function used for the linearized subproblems. The default value Yes is highly recommended. The Penalty parameter value is then also relevant.

If No is specified, the nonlinear constraint functions are evaluated only twice per major iteration. Hence this option may be useful if the nonlinear constraints are very expensive to evaluate. However, in general there is a great risk that convergence may not be achieved.

Major damping parameter d Default = 2.0

This parameter may assist convergence on problems that have highly nonlinear constraints. It is intended to prevent large relative changes between subproblem solutions and . For example, the default value 2.0 prevents the relative change in either xk or λk from exceeding 200 per cent. It will not be active on well behaved problems. (If all components of xk or λk are small, the norms of those vectors will not be allowed to increase beyond about 2.0.)

The parameter is used to interpolate between the solutions at the beginning and end of each major iteration. Thus, and are changed to

for some steplength . In the case of nonlinear equations (where the number of constraints is the same as the number of variables) this gives a damped Newton method.

This is a very crude control. If the sequence of major iterations does not appear to be converging, one should first re-run the problem with a higher Penalty parameter (say 2, 4 or 10). (Skip this re-run in the case of a square system of nonlinear equations: there are no degrees of freedom and the Penalty parameter value has essentially no effect.)

If the subproblem solutions continue to change violently, try reducing d to 0.2 or 0.1 (say).


Major iterations k Default = 50

This is the maximum number of major iterations allowed. It is intended to guard against an excessive number of linearizations of the nonlinear constraints, since in some cases the sequence of major iterations may not converge.

The progress of the major iterations can be best monitored using Print level 0 (the default).

Minor iterations k Default = 40

This is the maximum number of minor iterations allowed during a major iteration, after the linearized constraints for that subproblem have been satisfied. (An arbitrary number of minor iterations may be needed to find a feasible point for each subproblem.) The Iterations limit provides an independent limit on the total minor iterations (across all subproblems).

A moderate value (e.g., ) prevents excessive effort being expended on early major iterations, but allows later subproblems to be solved to completion.

The first major iteration is special: it terminates as soon as the linear constraints and bounds are satisfied (if possible), ignoring the nonlinear constraints.

In general it is unsafe to specify a value as small as k = 1 or 2. (Even when an optimal solution has been reached, a few minor iterations may be needed for the corresponding subproblem to be recognized as optimal.)

Optimality tolerance t Default = 1.0e-6
Penalty parameter r Default = 1.0

This specifies that the initial value of in the augmented Lagrangian (5) should be r times a certain default value 100/m1 , where m1 is the number of nonlinear constraints. It is used when Lagrangian = Yes (the default setting).

For early runs on a problem with unknown characteristics, the default value should be acceptable. If the problem is highly nonlinear and the major iterations do not converge, a larger value such as 2 or 5 may help. In general, a positive r may be necessary to ensure convergence, even for convex programs.

On the other hand, if r is too large, the rate of convergence may be unnecessarily slow. If the functions are not highly nonlinear or a good starting point is known, it is often safe to specify Penalty parameter 0.0.

If several related problems are to be solved, the following strategy for setting the Penalty parameter may be useful:

  1. Initially, use a moderate value for r (such as the default) and a reasonably low Iterations and/or Major iterations limit.
  2. If successive major iterations appear to be terminate with radically different solutions, try increasing the penalty parameter. (See also the Major damping parameter.)
  3. If there appears to be little progress between major iterations, it may help to reduce the penalty parameter.

Radius of convergence r Default = 0.01

This determines when the penalty parameter ?k is reduced (if initialized to a positive value). Both the nonlinear constraint violation (see rowerr below) and the relative change in consecutive Lagrange multiplier estimates must be less than r at the start of a major iteration before ?k is reduced or set to zero.

A few major iterations later, full completion is requested if not already set, and the remaining sequence of major iterations should converge quadratically to an optimum.

Row tolerance t Default = 1.0e-6

This specifies how accurately the nonlinear constraints should be satisfied at a solution. The default value is usually small enough, since model data is often specified to about that accuracy.

Let viol be the maximum violation of the nonlinear constraints, and let Failed to parse (unknown function "\rowerr"): {\displaystyle \rowerr = \viol / (1 + \xnorm)} , where xnorm is a measure of the size of the current solution (x, y). The solution is regarded as acceptably feasible if rowerr <= t.

If the problem functions involve data that is known to be of low accuracy, a larger Row tolerance may be appropriate. On the other hand, nonlinear constraints are often satisfied with rapidly increasing accuracy during the last few major iterations. It is common for the final solution to satisfy Failed to parse (unknown function "\rowerr"): {\displaystyle \rowerr \le t} .


Scale option l Default = 2 (LP) or 1 (NLP)
Scale Yes
Scale No
Scale linear variables Same as Scale option 1
Scale nonlinear variables Same as Scale option 2
Scale all variables Same as Scale option 2
Scale tolerance r Default = 0.9
Scale nonlinear variables
Scale all variables r

Three scale options are available as follows:

l = 0 No scaling. If storage is at a premium, this option saves m + n words of workspace.
l = 1 Linear constraints and variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0.
l = 2 All constraints and variables are scaled by the iterative procedure. Also, a certain additional scaling is performed that may be helpful if the right-hand side
b or the solution x is large. This takes into account columns of ( A I ) that are fixed or have positive lower bounds or negative upper bounds.

Scale option 1 is the default for nonlinear problems. (Only linear variables are scaled.)

Scale Yes sets the default scaling. Caution : If all variables are nonlinear, Scale Yes unexpectedly does nothing, because there are no linear variables to scale.) Scale No suppresses scaling (equivalent to Scale option 0).

With nonlinear constraints, Scale option 1 or 0 should generally be tried first. Scale option 2 gives scales that depend on the initial Jacobian, and should therefore be used only if (a) a good starting point is provided, and (b) the problem is not highly nonlinear.

Verify level l Default = 0
Verify objective gradients Same as Verify level 1
Verify constraint gradients Same as Verify level 2
Verify Same as Verify level 3
Verify gradients Same as Verify level 3
Verify Yes Same as Verify level 3
Verify No Same as Verify level 0

This option refers to a finite-difference check on the gradients (first derivatives) of each nonlinear function. It occurs before a problem is scaled, and before the first basis is factorized. (Hence, the variables may not yet satisfy the general linear constraints.)

l = 0 Only a "cheap" test is performed, requiring three evaluations of the nonlinear objective (if any) and two evaluations of the nonlinear constraints.
l = 1 A more reliable check is made on each component of the objective gradient.
l = 2 A check is made on each column of the Jacobian matrix associated with the nonlinear constraints.
l = 3 A detailed check is made on both the objective and the Jacobian.
l = -1 No checking is performed. This may be necessary if the functions are undefined at the starting point.

Options for Input and Output

The following options specify various files to be used, and the amount of printed output required.

The PRINT file provides more complete information than the SUMMARY file. It includes a listing of the main options, statistics about the problem, scaling information, the iteration log, the exit condition, and (optionally) the final solution. It also includes error messages. The files are specified in Prob.SOL.PrintFile and Prob.SOL.SummFile. For problems with linear constraints, Print level 0 gives most of the normal output. Print level 1 produces statistics for the basis matrix B and its LU factors each time the basis is factorized. Print frequency k produces one line of the iteration log every k minor iterations. Print frequency 1 causes every minor iteration to be logged. Print frequency 0 is shorthand for k = 99999. For problems with nonlinear constraints, Print level 0 produces just one line of output per major iteration. This provides a short summary of the progress of the optimization. The Print frequency is ignored. If Print level > 0, certain quantities are printed at the start of each major iteration, and minor iterations are logged according to the Print frequency. In the latter case, the value of l is best thought of as a binary number of the form
Print level	JFLXB

where each letter stands for a digit that is either 0 or 1. The quantities referred to are:

Print level l Default = 0
Print frequency k Default = 100
B Basis statistics, as mentioned above.
X xk , the nonlinear variables involved in the objective function or the constraints.
L δk, the Lagrange-multiplier estimates for the nonlinear constraints. (Suppressed if Lagrangian = No, since then δk = 0.)
F f (xk), the values of the nonlinear constraint functions.
J J (xk), the Jacobian matrix.

To obtain output of any item, set the corresponding digit to 1, otherwise to 0. For example, Print level 10 sets X = 1 and the other digits equal to zero. The nonlinear variables will be printed each major iteration.

If J = 1, the Jacobian is output column-wise. Column j is preceded by the value of the corresponding variable xj and a key to indicate whether the variable is basic, superbasic or nonbasic. (Hence if J = 1, there is no reason to specify X = 1 unless the objective contains more nonlinear variables than the Jacobian.) A typical line of output is

3   1.250000D+01 BS		1   1.00000D+00		4   2.00000D+00

which means that x3 is basic at value 12.5, and the third column of the Jacobian has elements of 1.0 and 2.0 in rows 1 and 4.

Solution Yes Default
Solution No

The Yes and No options control whether the final solution is output to the PRINT file. Numerical values are printed in f16.5 format where possible.

The special values 0, 1 and -1 are printed as .. 1.0 and -1.0. Bounds outside the range (-1020, 1020) appear as the word None.

Summary file f Default = 6 (typically)
Summary level l Default = 0
Summary frequency k Default = 100

The SUMMARY file provides a brief form of the iteration log and the exit condition. It also includes error messages. In an interactive environment, the output normally appears at the screen and allows a run to be monitored.

For problems with linear constraints, Summary level 0 produces brief output. Summary level 1 gives a few ad- ditional messages. Summary frequency k produces one line of the iteration log every k minor iterations. Summary frequency 1 causes every minor iteration to be logged. Summary frequency 0 is shorthand for k = 99999.

For problems with nonlinear constraints, Summary level 0 produces one line of output per major iteration. This provides a short summary of the progress of the optimization. The Summary frequency is ignored. If Summary level > 0, certain quantities are printed at the start of each major iteration, and minor iterations are logged according to the Summary frequency.