SNOPT Optional parameters

From TomWiki

Jump to: navigation, search


This page is part of the SNOPT Manual. See SNOPT.

The performance of each SNOPT interface is controlled by a number of parameters or "options". Each option has a default value that should be appropriate for most problems. The options are normally set in optPar or Prob.SOL.optPar before calling the solver. For special situations it is possible to specify non-standard values for some or all of the options. This options may be defined in a file called a SPECS file.

The SPECS file

The specs file contains a list of option definitions, using data in the following general form:

Begin options
   Iterations limit		500
   Minor feasibility tolerance	1.0e-7
   Solution			Yes
End options

We call such data a SPECS file because it specifies various options. The file starts with the keyword Begin and ends with End. Each line specifies a single option in free format, using one or more items as follows:

  1. A keyword (required for all options).
  2. A phrase (one or more words) that qualifies the keyword (only for some options).
  3. A number that specifies an integer or real value (only for some options). Such numbers may be up to 16 contiguous characters in Fortran 77's I, F, E or D formats, terminated by a space.

The items may be entered in upper or lower case or a mixture of both. Some of the keywords have synonyms, and certain abbreviations are allowed, as long as there is no ambiguity. Blank lines and comments may be used to improve readability. A comment begins with an asterisk (*), which may appear anywhere on a line. All subsequent characters on the line are ignored.

It may be useful to include a comment on the first (Begin) line of the file. This line is echoed to the SUMMARY file.

Most of the options described in the next section should be left at their default values for any given model. If experimentation is necessary, we recommend changing just one option at a time.

SPECS file checklist and defaults

The following example SPECS file shows all valid keywords and their default values. The keywords are grouped according to the function they perform.

Some of the default values depend on E, the relative precision of the machine being used. The values given here correspond to double-precision arithmetic on most current machines (\epsilon \approx 2.22\times 10^{-16}). Similar values would apply to any machine having about 15 decimal digits of precision.

BEGIN checklist of SPECS file parameters and their default values
* Printing
  Major print level			1 	* one-line major iteration log
  Minor print level 			1 	* one-line minor iteration log
  Print file 				9 	*
  Summary file 				6 	* typically the screen
  Print frequency 			100 	* minor iterations log on PRINT file
  Summary frequency 			100 	* minor iterations log on SUMMARY file
  Solution 				Yes 	* on the PRINT file
* Suppress options listing 			* default: options are listed
  System information 			No 	* prints more system information

* Problem specification
  Minimize 					* (opposite of Maximize)
* Feasible point 				* (alternative to Max or Min)
  Infinite Bound size 			1.0e+20 *

* Convergence Tolerances
  Major feasibility tolerance 		1.0e-6 	* target nonlinear constraint violation
Major optimality tolerance 		1.0e-6 	* target complementarity gap
Minor feasibility tolerance 		1.0e-6 	* for satisfying the QP bounds

* Derivative checking
Verify level 				0 	* cheap check on gradients
Start objective check at col 		1 	*
Stop objective check at col 		n 	*
Start constraint check at col 		1 	*
Stop constraint check at col 		n 	*

* Scaling
Scale option 				1 	* linear constraints and variables
Scale tolerance 			0.9 	*
* Scale Print 					* default: scales are not printed

* Other Tolerances
  Crash tolerance 			0.1 	*
  Linesearch tolerance 			0.9 	* smaller for more accurate search
  Pivot tolerance 			3.7e-11 * e^2/3

* QP subproblems
  QPSolver 			       Cholesky * default
  Crash option 				3 	* first basis is essentially triangular
  Elastic mode 				No 	* until it seems necessary
  Elastic weight 			1.0e+4 	* used only during elastic mode
  Iterations limit 			10000 	* or 20m if that is more
  Partial price 			1 	* 10 for large LPs

* SQP method
  Major iterations limit 		1000 	* or m if that is more
  Minor iterations limit 		500 	*
  Major step limit 			2.0 	*
  Superbasics limit 			n+1
  Hessian dimension 			750 	* or Superbasics limit if that is less
  Derivative level 			3 	*
  Derivative linesearch 			*
* Nonderivative linesearch 			*
  Function precision 			3.0e-13 * ?0.8 (almost full accuracy)
  Difference interval 			5.5e-7 	* (Function precision)^1/2
  Central difference interval 		6.7e-5 	* (Function precision)^1/3
  New superbasics limit 		99 	* controls early termination of QPs
  Objective row 			ObjRow 	* row number of objective in F(x)
  Penalty parameter 			0.0 	* initial penalty parameter
  Proximal point method 		1 	* satisfies linear constraints near x0
  Violation limit 			10.0 	* unscaled constraint violation limit
  Unbounded step size 			1.0e+18 *
  Unbounded objective 			1.0e+15 *

* Hessian approximation
  Hessian 			    full memory * default if n <= 75
  Hessian 			 limited memory * default if n > 75
  Hessian frequency 			999999 	* for full Hessian (never reset)
  Hessian updates 			20 	* for limited memory Hessian
  Hessian flush 			999999 	* no flushing

* Frequencies
  Check frequency 			60 	* test row residuals ||Ax - s||
  Expand frequency 			10000 	* for anti-cycling procedure
  Factorization frequency 		50 	* 100 for LPs
  Save frequency 			100 	* save basis map

* LU options
  LU factor tolerance 			10.0 	* limits size of multipliers in L
  LU update tolerance 			10.0 	* the same during updates
  LU singularity tolerance 	       3.25e-11 *
  LU partial pivoting 				* default pivot strategy
  LU rook pivoting 				* use rook pivoting for the LU
  LU complete pivoting 				* use complete pivoting for the LU

* Partitions of cw, iw, rw
  Total character workspace 		lencw 	*
  Total integer workspace 		leniw 	*
  Total real workspace 			lenrw 	*
  User character workspace 		500 	*
  User integer workspace 		500 	*
  User real workspace 			500 	*

* Miscellaneous
  Debug level 				0 	* for developers
  Timing level 				3 	* prints cpu times
  End of SPECS file checklist 

Description of the optional parameters

The following is an alphabetical list of the options that may appear in the SPECS file, and a description of their effect. In the description of the options we use the notation of the problem format SparseNP to refer to the objective and constraint functions.

Central difference intervalrDefault = ε1/3 ≈ 6.0e-6

When Derivative level < 3), the central-difference interval r is used near an optimal solution to obtain more accurate (but more expensive) estimates of gradients. Twice as many function evaluations are required compared to forward differencing. The interval used for the jth variable is hj = r(1 + |xj|). The resulting derivative estimates should be accurate to O(r2 ), unless the functions are badly scaled.

Check frequencyiDefault = 60

Every ith minor iteration after the most recent basis factorization, a numerical test is made to see if the current solution x satisfies the general linear constraints (including linearized nonlinear constraints, if any). The constraints are of the form Ax - s = b, where s is the set of slack variables. To perform the numerical test, the residual vector r = b - Ax + s is computed. If the largest component of r is judged to be too large, the current basis is refactorized and the basic variables are recomputed to satisfy the general constraints more accurately.

Check frequency 1 is useful for debugging purposes, but otherwise this option should not be needed.

Crash optioniDefault = 3
Crash tolerancerDefault = 0.1

Except on restarts, a CRASH procedure is used to select an initial basis from certain rows and columns of the constraint matrix ( A - I ). The Crash option i determines which rows and columns of A are eligible initially, and how many times CRASH is called. Columns of -I are used to pad the basis where necessary.

0The initial basis contains only slack variables: B = I .
1CRASH is called once, looking for a triangular basis in all rows and columns of the matrix A.
2CRASH is called twice (if there are nonlinear constraints). The first call looks for a triangular basis in linear rows, and the iteration proceeds with simplex iterations until the linear constraints are satisfied. The Jacobian is then evaluated for the first major iteration and CRASH is called again to find a triangular basis in the nonlinear rows (retaining the current basis for linear rows).
3CRASH is called up to three times (if there are nonlinear constraints). The first two calls treat linear equalities and linear inequalities separately. As before, the last call treats nonlinear rows before the first major iteration.

If i >= 1, certain slacks on inequality rows are selected for the basis first. (If i >= 2, numerical values are used to exclude slacks that are close to a bound.) CRASH then makes several passes through the columns of A, searching for a basis matrix that is essentially triangular. A column is assigned to "pivot" on a particular row if the column contains a suitably large element in a row that has not yet been assigned. (The pivot elements ultimately form the diagonals of the triangular basis.) For remaining unassigned rows, slack variables are inserted to complete the basis.

The Crash tolerance r allows the starting procedure CRASH to ignore certain "small" nonzeros in each column of A. If amax is the largest element in column j, other nonzeros aij in the column are ignored if |aij| = amax × r. (To be meaningful, r should be in the range 0 <= r < 1.)

When r > 0.0, the basis obtained by CRASH may not be strictly triangular, but it is likely to be nonsingular and almost triangular. The intention is to obtain a starting basis containing more columns of A and fewer (arbitrary) slacks. A feasible solution may be reached sooner on some problems.

For example, suppose the first m columns of A are the matrix shown under LU factor tolerance; i.e., a tridiag- onal matrix with entries -1, 4, -1. To help CRASH choose all m columns for the initial basis, we would specify Crash tolerance r for some value of r > 1/4.

Derivative leveliDefault = 3

The keyword Derivative level specifies which nonlinear function gradients are known analytically and will be supplied to SNOPT. This is normally automatically set by TOMLAB.

3All objective and constraint gradients are known.
2All constraint gradients are known, but some or all components of the objective gradient are unknown.
1The objective gradient is known, but some or all of the constraint gradients are unknown.
0Some components of the objective gradient are unknown and some of the constraint gradients are unknown.

The value i = 3 should be used whenever possible. It is the most reliable and will usually be the most efficient.

If i = 0 or 2, SNOPT will estimate the missing components of the objective gradient, using finite differences. However, it could increase the total run-time substantially, and there is less assurance that an acceptable solution will be located. If the nonlinear variables are not well scaled, it may be necessary to specify a nonstandard Difference interval (see below).

If i = 0 or 1, SNOPT will estimate missing elements of the Jacobian. For each column of the Jacobian, one call is needed to estimate all missing elements in that column, if any. If Jacobian = sparse and the sparsity pattern of the Jacobian happens to be

\ast & \ast  &       & \ast \\
& \text{?} & \text{?} &      \\
\ast &       & \text{?} &      \\
& \ast  &       & \ast

where * indicates known gradients and ? indicates unknown elements, SNOPT will use one call to estimate the missing element in column 2, and another call to estimate both missing elements in column 3. No calls are needed for columns 1 and 4.

At times, central differences are used rather than forward differences.

Derivative linesearchDefault
Nonderivative linesearch

At each major iteration a line search is used to improve the merit function. A Derivative linesearch uses safeguarded cubic interpolation and requires both function and gradient values to compute estimates of the step ak . If some analytic derivatives are not provided, or a Nonderivative linesearch is specified, SNOPT employs a line search based upon safeguarded quadratic interpolation, which does not require gradient evaluations.

A nonderivative line search can be slightly less robust on difficult problems, and it is recommended that the default be used if the functions and derivatives can be computed at approximately the same cost. If the gradients are very expensive relative to the functions, a nonderivative line search may give a significant decrease in computation time.

Comment: Derivative linesearch is only default if analytic derivatives are provided.

Difference interval h1Default = ε1/2 ≈ 1.5e-8

This alters the interval h1 that is used to estimate gradients by forward differences in the following circumstances:

  • In the initial ("cheap") phase of verifying the problem derivatives.
  • For verifying the problem derivatives.
  • For estimating missing derivatives.

In all cases, a derivative with respect to xj is estimated by perturbing that component of x to the value xj + h1(1 + |xj|), and then evaluating f0 (x) or f (x) at the perturbed point. The resulting gradient estimates should be accurate to O(h1 ) unless the functions are badly scaled. Judicious alteration of h1 may sometimes lead to greater accuracy.

Elastic modeNoDefault
Elastic modeYes

Normally SNOPT initiates elastic mode only when it seems necessary. Option Yes causes elastic mode to be entered from the beginning.

Elastic weightωDefault = 104

This keyword determines the initial weight γ associated with problem NP(γ).

At major iteration k, if elastic mode has not yet started, a scale factor \sigma_k = 1+||g(x_k)||_\infty is defined from the current objective gradient. Elastic mode is then started if the QP subproblem is infeasible, or the QP dual variables are larger in magnitude than σkω. The QP is re-solved in elastic mode with γ = σkω.

Thereafter, major iterations continue in elastic mode until they converge to a point that is optimal for problem NP(γ). If the point is feasible for SparseNP (v = w = 0), it is declared locally optimal. Otherwise, γ is increased by a factor of 10 and major iterations continue. If γ has already reached a maximum allowable value, SparseNP is declared locally infeasible.

Expand frequencyiDefault = 10000

This option is part of the EXPAND anti-cycling procedure designed to make progress even on highly degenerate problems.

For linear models, the strategy is to force a positive step at every iteration, at the expense of violating the bounds on the variables by a small amount. Suppose that the Minor feasibility tolerance is δ. Over a period of i iterations, the tolerance actually used by SNOPT increases from 0.5δ to δ(in steps of 0.5δ/i).

For nonlinear models, the same procedure is used for iterations in which there is only one superbasic variable. (Cycling can occur only when the current solution is at a vertex of the feasible region.) Thus, zero steps are allowed if there is more than one superbasic variable, but otherwise positive steps are enforced.

Increasing i helps reduce the number of slightly infeasible nonbasic variables (most of which are eliminated during a resetting procedure). However, it also diminishes the freedom to choose a large pivot element (see Pivot tolerance).

Factorization frequencykDefault = 50

At most k basis changes will occur between factorizations of the basis matrix.

  • With linear programs, the basis factors are usually updated every iteration. The default k is reasonable for typical problems. Higher values up to k = 100 (say) may be more efficient on problems that are extremely sparse and well scaled.
  • When the objective function is nonlinear, fewer basis updates will occur as an optimum is approached. The number of iterations between basis factorizations will therefore increase. During these iterations a test is made regularly (according to the Check frequency) to ensure that the general constraints are satisfied. If necessary the basis will be refactorized before the limit of k updates is reached.
Feasibility tolerancetDefault = 1.0e-6
see Minor feasibility tolerance
Feasibility point
see Minimize
Function precisionεRDefault = ε0.8 ≈ 3.7e-11
see Minimize

The relative function precision εR is intended to be a measure of the relative accuracy with which the nonlinear functions can be computed. For example, if f (x) is computed as 1000.56789 for some relevant x and if the first 6 significant digits are known to be correct, the appropriate value for ER would be 1.0e-6.

(Ideally the functions f (x) or Fi (x) should have magnitude of order 1. If all functions are substantially less than 1 in magnitude, ER should be the absolute precision. For example, if f (x) = 1.23456789e-4 at some point and if the first 6 significant digits are known to be correct, the appropriate value for ER would be 1.0e-10.)

  • The default value of ER is appropriate for simple analytic functions.
  • In some cases the function values will be the result of extensive computation, possibly involving an iterative procedure that can provide rather few digits of precision at reasonable cost. Specifying an appropriate Function precision may lead to savings, by allowing the line search procedure to terminate when the difference between function values along the search direction becomes as small as the absolute error in the values.
Hessian dimensionrDefault = Superbasics limit or 750

This specifies that an r × r triangular matrix R is to be available for use by the Cholesky QP solver (to define the reduced Hessian according to RTR = ZTHZ ).

Hessian full memoryDefault = Full if n1 <= 75
Hessian limited memory

These options select the method for storing and updating the approximate Hessian. (SNOPT uses a quasi-Newton approximation to the Hessian of the Lagrangian. A BFGS update is applied after each major iteration.)

If Hessian full memory is specified, the approximate Hessian is treated as a dense matrix and the BFGS updates are applied explicitly. This option is most efficient when the number of nonlinear variables n1 is not too large (say, less than 75). In this case, the storage requirement is fixed and one can expect 1-step Q-superlinear convergence to the solution.

Hessian limited memory should be used on problems where n1 is very large. In this case a limited-memory procedure is used to update a diagonal Hessian approximation Hr a limited number of times. (Updates are accumulated as a list of vector pairs. They are discarded at regular intervals after Hr has been reset to their diagonal.)

Hessian frequencyiDefault = 999999

If Hessian Full is selected and i BFGS updates have already been carried out, the Hessian approximation is reset to the identity matrix. (For certain problems, occasional resets may improve convergence, but in general they should not be necessary.)

Hessian Full memory and Hessian frequency = 20 have a similar effect to Hessian Limited memory and Hessian updates = 20 (except that the latter retains the current diagonal during resets).

Hessian updatesiDefault = 20

If Hessian Limited memory is selected and i BFGS updates have already been carried out, all but the diagonal elements of the accumulated updates are discarded and the updating process starts again.

Broadly speaking, the more updates stored, the better the quality of the approximate Hessian. However, the more vectors stored, the greater the cost of each QP iteration. The default value is likely to give a robust algorithm without significant expense, but faster convergence can sometimes be obtained with significantly fewer updates (e.g., i = 5).

Iterations limitkDefault = max{10000, 20m}

This is the maximum number of minor iterations allowed (i.e., iterations of the simplex method or the QP algo- rithm), summed over all major iterations.

Infinite Bound sizerDefault = 1.0e+20

If r > 0, r defines the "infinite" bound BigBnd in the definition of the problem constraints. Any upper bound greater than or equal to BigBnd will be regarded as plus infinity (and similarly for a lower bound less than or equal to -BigBnd). If r <= 0, the default value is used.

Linesearch tolerancetDefault = 0.9

This controls the accuracy with which a steplength will be located along the direction of search each iteration. At the start of each line search a target directional derivative for the merit function is identified. This parameter determines the accuracy to which this target value is approximated.

  • t must be a real value in the range 0.0 <= t <= 1.0.
  • The default value t = 0.9 requests just moderate accuracy in the line search.
  • If the nonlinear functions are cheap to evaluate, a more accurate search may be appropriate; try t = 0.1, 0.01 or 0.001. The number of major iterations might decrease.
  • If the nonlinear functions are expensive to evaluate, a less accurate search may be appropriate. If all gradients are known, try t = 0.99. (The number of major iterations might increase, but the total number of function evaluations may decrease enough to compensate.)
  • If not all gradients are known, a moderately accurate search remains appropriate. Each search will require only 1-5 function values (typically), but many function calls will then be needed to estimate missing gradients for the next iteration.
Log frequencykDefault = 100

see Print frequency

LU factor tolerancer1Default = 100.0 (LP) or 3.99 (NLP)
LU factor tolerancer1Default = 10.0 (LP) or 3.99 (NLP)

These tolerances affect the stability and sparsity of the basis factorization B = LU during refactorization and updating, respectively. They must satisfy r1, r2 >= 1.0. The matrix L is a product of matrices of the form

1 \\ \mu & 1

where the multipliers µ satisfy |µ| <= ri . Smaller values of ri favor stability, while larger values favor sparsity. The default values usually strike a good compromise.

  • For large and relatively dense problems, r1 = 5.0 (say) may give a useful improvement in stability without impairing sparsity to a serious degree.
  • For certain very regular structures (e.g., band matrices) it may be necessary to reduce r1 and/or r2 in order to achieve stability. For example, if the columns of A include a submatrix of the form

4 & -1 \\
-1 &  4 &  -1 \\
& -1 &   4   &  -1 \\
&    & \ddots& \ddots& \ddots\\
&    &       &  -1   &    4  & -1 \\
&    &       &       &   -1  &  4

both r1 and r2 should be in the range 1.0 \le r_i < 4.0.

LU Partial PivotingDefault
LU Rook Pivoting
LU Complete Pivoting

The LU factorization implements a Markowitz-type search for a pivot that locally minimizes the fill-in subject to a threshold pivoting stability criterion. The default option is to use threshold partial pivoting. The options LU rook pivoting and LU complete pivoting are more expensive than partial pivoting but are more stable and better at revealing rank.

LU density tolerancer1Default = 0.5
LU singularity tolerancer2Default = ε0.67 ≈ 3.25e-11

The density tolerance r1 is used during LU factorization of the basis matrix. Columns of L and rows of U are formed one at a time, and the remaining rows and columns of the basis are altered appropriately. At any stage, if the density of the remaining matrix exceeds r1 , the Markowitz strategy for choosing pivots is terminated. The remaining matrix is factored by a dense LU procedure. Raising the density tolerance towards 1.0 may give slightly sparser LU factors, with a slight increase in factorization time.

The singularity tolerance r2 helps guard against ill-conditioned basis matrices. When the basis is refactorized, the diagonal elements of U are tested as follows: if |U_{jj}| \le r_2 or |U_{jj}| < r_2 \max_i |U_{ij}|\,, the jth column of the basis is replaced by the corresponding slack variable. (This is most likely to occur after a restart, or at the start of a major iteration.)

In some cases, the Jacobian matrix may converge to values that make the basis exactly singular. (For example, a whole row of the Jacobian could be zero at an optimal solution.) Before exact singularity occurs, the basis could become very ill-conditioned and the optimization could progress very slowly (if at all). Setting a larger tolerance r2 = 1.0e-5, say, may help cause a judicious change of basis.

Major feasibility toleranceεrDefault = 1.0e-6

This specifies how accurately the nonlinear constraints should be satisfied. The default value of 1.0e-6 is appropriate when the linear and nonlinear constraints contain data to about that accuracy.

Let rowerr be the maximum nonlinear constraint violation, normalized by the size of the solution. It is required to satisfy

\text{rowerr} = \max_i \text{viol}_i /||x|| \le \epsilon_r,

where vvioli is the violation of the ith nonlinear constraint (i = 1 : nnCon).

In the major iteration log, rowerr appears as the quantity labeled "Feasibl". If some of the problem functions are known to be of low accuracy, a larger Major feasibility tolerance may be appropriate.

Major optimality toleranceεdDefault = 1.0e-6

This specifies the final accuracy of the dual variables. On successful termination, SNOPT will have computed a solution (x, s, π) such that

\text{maxComp} = \max_j \text{Comp}_j /||\pi|| \le \epsilon_d,

where Compj is an estimate of the complementarity slackness for variable j (j = 1 : n + m). The values Compj are computed from the final QP solution using the reduced gradients dj = gj - πTaj (where gj is the jth component of the objective gradient, aj is the associated column of the constraint matrix (A - I ), and &pi is the set of QP dual variables):

\text{Comp}_j = \begin{array}{lcl}
d_j \min{\{ x_j - l_j,1 \}}  & & if d_j \ge 0\\
- d_j \min{\{ u_j - x_j,1 \}}  & & if d_j < 0.

In the major iteration log, maxComp appears as the quantity labeled "Optimal".

Major iterations limitkDefault = max{1000, m}

This is the maximum number of major iterations allowed. It is intended to guard against an excessive number of linearizations of the constraints.

Major print levelpDefault = 00001

This controls the amount of output to the PRINT and SUMMARY files each major iteration. Major print level 1 gives normal output for linear and nonlinear problems, and Major print level 11 gives addition details of the Jacobian factorization that commences each major iteration.

In general, the value being specified may be thought of as a binary number of the form

Major print level JFDXbs

where each letter stands for a digit that is either 0 or 1 as follows:

sa single line that gives a summary of each major iteration. (This entry in JFDXbs is not strictly binary since the summary line is printed whenever JFDXbs = 1).
bBASIS statistics, i.e., information relating to the basis matrix whenever it is refactorized. (This output is always provided if JFDXbs = 10).
Xxk , the nonlinear variables involved in the objective function or the constraints.
Dπk , the dual variables for the nonlinear constraints.
FF (xk ), the values of the nonlinear constraint functions.
JJ (xk ), the Jacobian matrix.

To obtain output of any items JFDXbs, set the corresponding digit to 1, otherwise to 0.

If J=1, the Jacobian matrix will be output column-wise at the start of each major iteration. Column j will be preceded by the value of the corresponding variable xj and a key to indicate whether the variable is basic, superbasic or nonbasic. (Hence if J=1, there is no reason to specify X=1 unless the objective contains more nonlinear variables than the Jacobian.) A typical line of output is

3	1.250000D+01 BS	1	1.00000E+00	4	2.00000E+00

which would mean that x3 is basic at value 12.5, and the third column of the Jacobian has elements of 1.0 and 2.0 in rows 1 and 4.

Major print level 0 suppresses most output, except for error messages.

Major step limitrDefault = 2.0

This parameter limits the change in x during a line search. It applies to all nonlinear problems, once a "feasible solution" or "feasible subproblem" has been found.

  1. A line search determines a step a over the range 0 < a = ß, where ß is 1 if there are nonlinear constraints, or the step to the nearest upper or lower bound on x if all the constraints are linear. Normally, the first steplength tried is a1 = min(1, ß).
  2. In some cases, such as f (x) = aebx or f (x) = axb , even a moderate change in the components of x can lead to floating-point overflow. The parameter r is therefore used to define a limit ߯ = r(1 + x )/ p (where p is the search direction), and the first evaluation of f (x) is at the potentially smaller steplength a1 = min(1, ߯, ß).
  3. Wherever possible, upper and lower bounds on x should be used to prevent evaluation of nonlinear functions at meaningless points. The Major step limit provides an additional safeguard. The default value r = 2.0 should not affect progress on well behaved problems, but setting r = 0.1 or 0.01 may be helpful when rapidly varying functions are present. A "good" starting point may be required. An important application is to the class of nonlinear least-squares problems.
  4. In cases where several local optima exist, specifying a small value for r may help locate an optimum near the starting point.
Feasible point

The keywords Minimize and Maximize specify the required direction of optimization. It applies to both linear and nonlinear terms in the objective.

The keyword feasible point means "Ignore the objective function" while finding a feasible point for the linear and nonlinear constraints. It can be used to check that the nonlinear constraints are feasible without altering the call to SNOPT.

Minor iterations limitkDefault = 500
Feasible point

If the number of minor iterations for the optimality phase of the QP subproblem exceeds k, then all nonbasic QP variables that have not yet moved are frozen at their current values and the reduced QP is solved to optimality.

Note that more than k minor iterations may be necessary to solve the reduced QP to optimality. These extra iterations are necessary to ensure that the terminated point gives a suitable direction for the line search.

In the major iteration log, a t at the end of a line indicates that the corresponding QP was artificially terminated using the limit k.

Note that Iterations limit defines an independent absolute limit on the total number of minor iterations (summed over all QP subproblems).

Minor feasibility tolerancetDefault = 1.0e-6

SNOPT tries to ensure that all variables eventually satisfy their upper and lower bounds to within the tolerance t. This includes slack variables. Hence, general linear constraints should also be satisfied to within t.

Feasibility with respect to nonlinear constraints is judged by the Major feasibility tolerance (not by t).

  • If the bounds and linear constraints cannot be satisfied to within t, the problem is declared infeasible. Let sInf be the corresponding sum of infeasibilities. If sInf is quite small, it may be appropriate to raise t by a factor of 10 or 100. Otherwise, some error in the data should be suspected.
  • Nonlinear functions will be evaluated only at points that satisfy the bounds and linear constraints. If there are regions where a function is undefined, every attempt should be made to eliminate these regions from the problem.

For example, if f(x) = \sqrt{x_1} + \log x_2, it is essential to place lower bounds on both variables. If t = 1.0e-6, the bounds x_1 \ge 10^{-5} and x_2 \ge 10^{-4} might be appropriate. (The log singularity is more serious. In general, keep x as far away from singularities as possible.)

  • If Scale option = 1, feasibility is defined in terms of the scaled problem (since it is then more likely to be meaningful).
  • In reality, SNOPT uses t as a feasibility tolerance for satisfying the bounds on x and s in each QP subproblem. If the sum of infeasibilities cannot be reduced to zero, the QP subproblem is declared infeasible. SNOPT is then in elastic mode thereafter (with only the linearized nonlinear constraints defined to be elastic). See the Elastic options.
Minor print levelkDefault = 1

This controls the amount of output to the PRINT and SUMMARY files during solution of the QP subproblems. The value of k has the following effect:

0No minor iteration output except error messages.
>= 1A single line of output each minor iteration (controlled by Print frequency and Summary frequency).
>= 10Basis factorization statistics generated during the periodic refactorization of the basis (see Factorization frequency). Statistics for the first factorization each major iteration are controlled by the Major print level.
New superbasics limitiDefault = 99

This option causes early termination of the QP subproblems if the number of free variables has increased signifi- cantly since the first feasible point. If the number of new superbasics is greater than i the nonbasic variables that have not yet moved are frozen and the resulting smaller QP is solved to optimality.

In the major iteration log, a "T" at the end of a line indicates that the QP was terminated early in this way.

Partial priceiDefault = 10 (LP) or 1 (NLP)

This parameter is recommended for large problems that have significantly more variables than constraints. It reduces the work required for each "pricing" operation (when a nonbasic variable is selected to become superbasic).

  • When i = 1, all columns of the constraint matrix ( A - I ) are searched.
  • Otherwise, A and I are partitioned to give i roughly equal segments Aj , Ij (j = 1 to i). If the previous pricing search was successful on Aj , Ij , the next search begins on the segments Aj+1 , Ij+1 . (All subscripts here are modulo i.)
  • If a reduced gradient is found that is larger than some dynamic tolerance, the variable with the largest such reduced gradient (of appropriate sign) is selected to become superbasic. If nothing is found, the search continues on the next segments Aj+2 , Ij+2 , and so on.
  • Partial price t (or t/2 or t/3) may be appropriate for time-stage models having t time periods.
Pivot tolerancerDefault = ε2/3 ≈ 3.7e-11

During solution of QP subproblems, the pivot tolerance is used to prevent columns entering the basis if they would cause the basis to become almost singular.

  • When x changes to x + αp for some search direction p, a "ratio test" is used to determine which component of x reaches an upper or lower bound first. The corresponding element of p is called the pivot element.
  • Elements of p are ignored (and therefore cannot be pivot elements) if they are smaller than the pivot tolerance r.
  • It is common for two or more variables to reach a bound at essentially the same time. In such cases, the Minor Feasibility tolerance (say t) provides some freedom to maximize the pivot element and thereby improve numerical stability. Excessively small values of t should therefore not be specified.
  • To a lesser extent, the Expand frequency (say f ) also provides some freedom to maximize the pivot element. Excessively large values of f should therefore not be specified.
Proximal point methodiDefault = 1

i = 1 or 2 specifies minimization of ||{x-x_0}||_1\, or \frac{1}{2}||{x-x_0}||_2^2 when the starting point x0 is changed to satisfy the linear constraints (where x0 refers to nonlinear variables).

QPSolver CholeskyDefault
QPSolver CGDefault = 1
QPSolver QNDefault = 1

Specifies the algorithm used to solve the QP subproblem. QPSolver Cholesky uses the active-set QP method of SQOPT, which holds the full Cholesky factor R of the reduced Hessian ZTHZ . As the QP iterations proceed, the dimension of R changes as the number of superbasic variables changes. If it is necessary that the number of superbasic variables increases beyond the value of Hessian dimension, the reduced Hessian cannot be stored and the solver switches to QPSolver CG. The Cholesky solver is reactivated if the number of superbasics stabilizes at a value less than Hessian dimension.

QPSolver QN solves the QP subproblem using a quasi-Newton method similar to MINOS. In this case, R is the factor of a quasi-Newton approximate Hessian.

QPSolver CG uses an active-set method similar to QPSolver QN, but uses the conjugate-gradient method to solve all systems involving the reduced Hessian.

  • The Cholesky QP solver is the most robust, but may require a significant amount of computation if the number of superbasics is large.
  • The quasi-Newton QP solver does not require the computation of the R at the start of each QP and may be appropriate when the number of superbasics is large, but each QP subproblem requires relatively few minor iterations.
  • The conjugate-gradient QP solver is appropriate for problems with large numbers of degrees of freedom.
Reduced Hessian dimensioniDefault = min{750, n1 + 1}
see Hessian dimension
Scale optioniDefault = 2 (LP) or 1 (NLP)
Scale tolerancerDefault = 0.9
Scale Print

Three scale options are available as follows:

0No scaling. This is recommended if it is known that x and the constraint matrix (and Jacobian) never have very large elements (say, larger than 1000).
1Linear constraints and variables are scaled by an iterative procedure that attempts to make the matrix coefficients as close as possible to 1.0 (see Fourer [11]). This will sometimes improve the performance of the solution procedures.
2All constraints and variables are scaled by the iterative procedure. Also, an additional scaling is performed that takes into account columns of ( A - I ) that are fixed or have positive lower bounds or negative upper bounds.

If nonlinear constraints are present, the scales depend on the Jacobian at the first point that satisfies the linear constraints. Scale option 2 should therefore be used only if (a) a good starting point is provided, and (b) the problem is not highly nonlinear.

Scale tolerance affects how many passes might be needed through the constraint matrix. On each pass, the scaling procedure computes the ratio of the largest and smallest nonzero coefficients in each column:

\rho_j = \max_{i} |a_{ij}| / \min_i |a_{ij}| \qquad (a_{ij}\ne 0).

If maxjρj is less than r times its previous value, another scaling pass is performed to adjust the row and column scales. Raising r from 0.9 to 0.99 (say) usually increases the number of scaling passes through A. At most 10 passes are made.

Scale Print causes the row-scales r(i) and column-scales c(j) to be printed. The scaled matrix coefficients are \bar{a}_{ij} = a_{ij} c(j) / r(i), and the scaled bounds on the variables and slacks are \bar{l}_j = l_j / c(j), \bar{u}_j = u_j / c(j)

QPSolver CholeskyDefault
QPSolver CGDefault = 1
QPSolver QNDefault = 1
SolutionIf Optimal, Infeasible, or Unbounded
Start Objective Check at ColumnkDefault = 1
Start Constraint Check at ColumnkDefault = 1
Stop Objective Check at ColumnlDefault = 'n'l
Stop Constraint Check at ColumnlDefault = 'n'l

If Verify level > 0, they may be used to abbreviate the verification of individual derivative elements. For example:

  • If the first 100 objective gradients appeared to be correct in an earlier run, and if you have just found a bug in that ought to fix up the 101-th component, then you might as well specify Start Objective Check at Column 101. Similarly for columns of the Jacobian.
  • If the first 100 variables occur nonlinearly in the constraints, and the remaining variables are nonlinear only in the objective, then one must set the first 100 components of g(*) to zero, but these hardly need to be verified. The above option would again be appropriate.
Summary filefDefault = 6
Summary frequencykDefault = 100

If f > 0 and Minor print level > 0, a line of the QP iteration log will be output to file f every kth minor iteration.

Superbasics limitiDefault = n1 + 1

This places a limit on the storage allocated for superbasic variables. Ideally, i should be set slightly larger than the "number of degrees of freedom" expected at an optimal solution.

For linear programs, an optimum is normally a basic solution with no degrees of freedom. (The number of variables lying strictly between their bounds is no more than m, the number of general constraints.) The default value of i is therefore 1.

For nonlinear problems, the number of degrees of freedom is often called the "number of independent variables".

Normally, i need not be greater than n1 + 1, where n1 is the number of nonlinear variables. For many problems, i may be considerably smaller than n1 . This will save storage if n1 is very large.

System InformationNoDefault
System InformationYes

This option allows the knowledgeable user to print some additional information on the progress of the major and minor iterations.

Timing leveliDefault = 3

i = 0 suppresses output of cpu times. (Intended for installations with dysfunctional timing routines.)

Unbounded objective valuefmaxDefault = 1.0e+15
Unbounded step sizeαmaxDefault = 1.0e+18

These parameters are intended to detect unboundedness in nonlinear problems. (They may not achieve that purpose!) During a line search, f0 is evaluated at points of the form x + αp, where x and p are fixed and α varies.

if |f0 | exceeds fmax or a exceeds amax , iterations are terminated with the exit message Problem is unbounded (or badly scaled).

If singularities are present, unboundedness in f0(x) may be manifested by a floating-point overflow (during the evaluation of f0(x + αp)), before the test against fmax can be made.

Unboundedness in x is best avoided by placing finite upper and lower bounds on the variables.

Verify levellDefault = 0

This option refers to finite-difference checks on the derivatives computed by the user-provided routines. Derivatives are checked at the first point that satisfies all bounds and linear constraints.

0Only a "cheap" test will be performed.
1Individual gradients will be checked (with a more reliable test). A key of the form "OK" or "Bad?" indicates whether or not each component appears to be correct.
2Individual columns of the problem Jacobian will be checked.
3Options 2 and 1 will both occur (in that order).

Verify level 3 should be specified whenever a new function routine is being developed. The Start and Stop keywords may be used to limit the number of nonlinear variables checked. Missing derivatives are not checked, so they result in no overhead.

Violation limittDefault = 10

This keyword defines an absolute limit on the magnitude of the maximum constraint violation after the line search. On completion of the line search, the new iterate xk+1 satisfies the condition

v_i(x_{k+1}) \le \tau\max\{ 1, v_i(x_0) \},

where x0 is the point at which the nonlinear constraints are first evaluated and vi(x) is the ith nonlinear constraint violation vi(x) = max(0,lifi(x),fi(x) − ui).

The effect of this violation limit is to restrict the iterates to lie in an expanded feasible region whose size depends on the magnitude of τ. This makes it possible to keep the iterates within a region where the objective is expected to be well-defined and bounded below. If the objective is bounded below for all values of the variables, then t may be any large positive value.

Personal tools