MAD High-Level Interfaces to Matlab’s ODE and Optimization Solvers
This page is part of the MAD Manual. See MAD Manual. |
In many cases users wish to use AD for calculating derivatives necessary for ODE or optimisation solvers. When using the TOMLAB optimisation solvers the use of AD is easily achieved by setting the flags, Prob.NumDiff and Prob.ConsDiff as noted in Section 3. When solving stiff ODE's using Matlab's ODE solvers or solving optimization problems using Matlab's Optimization Toolbox functions the user can either:
- Directly use calls to Mad's functions to calculate required derivatives as detailed in Sections 4-5.
- Or use the facilities of Mad's high-level interfaces [FK04] to the ODE and optimization solvers as detailed in this section.
For a vector-valued function F(x), , Mad currently provides 3 techniques for computing the Jacobian JF,
Forward mode AD with full storage (see Section 4) - this is most likely to be efficient when n is small.
Forward mode AD with sparse storage (see Section 5.3) - this is most likely to be efficient for larger n, particularly if the Jacobian is sparse. Even if the Jacobian is dense, then provided n is sufficiently large this method is likely to outperform full storage because there is always sparsity present in the early stages of a Jacobian calculation. For small n this technique is likely to be slower than full storage because of the overhead of manipulating a sparse data structure.
Forward mode AD with compressed storage (see Section 5.4) - this is likely to be most efficient for large n when the Jacobian is sparse and has a favourable sparsity pattern (e.g., diagonal, banded) in which case the number of directional derivatives required to compute the Jacobian may be reduced to p < n with p dependent on the sparsity pattern and bounded below by the maximum number of nonzeros in a row of the Jacobian. Note that sparse storage always uses fewer floating-point operations than compressed storage - but provided p is sufficiently small then compression outperforms sparse storage. This method requires the sparsity pattern of the Jacobian (or a safe over-estimate of the sparsity pattern) to be known. If the sparsity pattern is unknown but is known to be fixed then the sparsity pattern may be estimated using a Jacobian calculation with sparse storage and randomly perturbed x (to reduce the chance of unfortuitous cancellation giving a Jacobian entry of zero when in general it is non-zero). If the sparsity pattern changes between evaluations of F (perhaps because of branching) then either compression should not be used or the supplied Jacobian pattern must be an over-estimate to cover any possible pattern.
In general it is not possible to predict a priori which technique will be most efficient for any particular problem since this depends on many factors: sparsity pattern of JF; relative size of n, m and when compression is possible p; efficiency of sparse data access; performance of the computer being used. A rational approach is to try each technique (except default to full storage for small n and reject full storage for large n), timing its evaluation and choosing the most efficient. To make this easy for the user this approach has been implemented within the function MADJacInternalEval and interfaces to MADJacInternalEval for ODE or optimisation solvers have been provided. Consequently user's need never access MADJacInternalEval directly and needn't bother themselves with the exact AD technique being used confident that a reasonably efficient technique for their problem has been automatically selected. Of course, this selection process has an associated overhead so the interface allows the user to mandate a technique if they so wish, perhaps based on experiments conducted previously.
Stiff ODE Solution
When solving the ordinary differential equation (ODE)
for where are fixed parameters using Matlab's stiff 3 ODE solvers ode15s, ode23s, ode23t, and ode23tb then it is necessary to calculate, or at least approximate accurately, the Jacobian,
for the solvers to use a quasi-Newton solve within their implicit algorithms. By default such Jacobians are approximated by finite-differences. If the Jacobian is sparse and the sparsity pattern supplied then Jacobian compression is used [SR97] (c.f. Section 5.4). Using MADODEJacEval we may easily take advantage of AD.
Example MADEXvdpode: Jacobian driver for the Van der Pol ODE.
This example shows how to use MAD's ODE Jacobian function MADJacODE to solve the stiff Van der Pol ODE problem.
See also: vdpode_f, vdpode_J, vdpode, ode15s, MADODERegister, MADODEJacEval, MADODEReport
Contents
- Set up the ODE problem
- Integrate using finite-differencing for the Jacobian
- Use the analytic Jacobian
- Use the MAD Jacobian function MADODEJacEval
- Repeat of MADODEJacEval use
- MADODEReport
- Conclusions
Set up the ODE problem
First we define:
MU = 1000; % default stiffness parameter for Van der Pol problem
tspan = [0; max(20,3*MU)]; % integration timespan of several periods
y0 = [2; 0]; % initial conditions
options=odeset('Stats','on'); % display ODE statistics
Integrate using finite-differencing for the Jacobian
Using the default finite-differencing a solution is easily found.
t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
tfd=cputime-t1;
disp(['Jacobian by Finite-Differencing - CPU time = ',num2str(tfd)])
591 successful steps 225 failed attempts 1883 function evaluations 45 partial derivatives 289 LU decompositions 1747 solutions of linear systems Jacobian by Finite-Differencing - CPU time = 0.75108
Use the analytic Jacobian
Since the Jacobian is recalculated many times supplying an analytic Jacobian should improve performance.
options = odeset(options,'Jacobian',@vdpode_J);
t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
tjac=cputime-t1;
disp(['Jacobian by hand-coding - CPU time = ',num2str(tjac)])
591 successful steps 225 failed attempts 1749 function evaluations 45 partial derivatives 289 LU decompositions 1747 solutions of linear systems Jacobian by hand-coding - CPU time = 0.75108
Use the MAD Jacobian function MADODEJacEval
The automated MAD Jacobian calculation avoids the need for the user to supply the Jacobian. First we use odeset to specify that the Jacobian will be calculated by MADODEJacEval. We then use MADODEDelete to first clear any existing settings for vdpode f, then MADODERegister register the function and then integrate.
options = odeset(options,'Jacobian',@MADODEJacEval);
MADODEDelete; % deletes all previous settings for ODE evaluation
MADODERegister(@vdpode_f); % all we must do is supply the ODE function handle
t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
tmadjac=cputime-t1;
disp(['Jacobian by MADODEJacEval - CPU time = ',num2str(tmadjac)])
591 successful steps 225 failed attempts 1749 function evaluations 45 partial derivatives 289 LU decompositions 1747 solutions of linear systems Jacobian by MADODEJacEval - CPU time = 0.88127
Repeat of MADODEJacEval use
On a second use (''without ''calling MADODEDelete) MADODEJacEval should be slightly faster since it "remembers"
which technique to use to calculate the Jacobian.
t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
tmadjac2=cputime-t1;
disp(['Repeat of Jacobian by MADODEJacEval - CPU time = ',num2str(tmadjac2)])
591 successful steps 225 failed attempts 1749 function evaluations 45 partial derivatives 289 LU decompositions 1747 solutions of linear systems Repeat of Jacobian by MADODEJacEval - CPU time = 0.91131
MADODEReport
Using MADODEReport we see which technique is used and why - here full storage is used due to the small size (2 × 2) of the Jacobian required.
MADODEReport % report on which technique used
MADODEreport Function name vdpode_f Jacobian of size 2x2 Fwd AD - full 0.010014 s Fwd AD - compressed Inf s FWD AD - full selected MADJacInternalEval: n< MADMinSparseN=10 so using forward mode full
Conclusions
Using MAD for such small ODE problems gives similar performance to finite-differencing. MAD is more useful for large ODE problems, particularly those with sparse Jacobians in, for example the Brusselator problem of MADEXbrussode.
bar([tfd,tjac,tmadjac,tmadjac2])
colormap hsv
ylabel('CPU time (s)')
title('Solving the Van der Pol Problem with Different Jacobian Techniques')
text(1,tfd/2,'FD','Rotation',90)
text(2,tfd/2,'Analytic Jacobian','Rotation',90)
text(3,tfd/2,'MADJacODE','Rotation',90)
text(4,tfd/2,'Repeat of MADJacODE','Rotation',90)
Using Mad for such small ODE problems gives similar performance to finite-differencing. Mad is more useful for large ODE problems, particularly those with sparse Jacobians in, for example the Brusselator problem of MADEXbrussode.
Example MADEXbrussode: Jacobian driver for the Brusselator ODE.
This example of MADEXbrussode shows how to use MAD's high-level ODE Jacobian function MADODEJacEval to solve the stiff ODE Brusselator problem.
See also: brussode f, brussode S, brussode, ode15s, MADODERegister, MADODEFuncEval, MADODEJacEval, MADODEReport
Contents
- Set up the ODE problem
- Integrate Brusselator problem using finite-differencing for the Jacobian
- Integrate Brusselator problem using compressed finite-differencing
- Use the MAD ODE Jacobian function - MADODEJacEval
- Specifying that the sparsity pattern is fixed
- Specifying the sparsity pattern
- Difference in solutions
- Specifying a Jacobian technique
- Conclusions
Set up the ODE problem
First we define:
N = 80; % number of mesh points,
tspan = [0; 50]; % timespan of integration,
t=tspan(1); % initial time,
y0 = [1+sin((2*pi/(N+1))*(1:N)); repmat(3,1,N)];
y0=reshape(y0,[2*N 1]); % initial conditions. and sparsity pattern
S=brussode_S(N); % find sparsity pattern.
Integrate Brusselator problem using finite-differencing for the Jacobian
Initially we use finite-differences to approximate the Jacobian ignoring any sparsity, timing and running the cal- culation.
options = odeset('Vectorized','on','Stats','on');
t1=cputime;
[tfd,yfd] = ode15s(@brussode_f,tspan,y0,options,N);
cpufd=cputime-t1;
disp(['FD-Jacobian : cputime = ',num2str(cpufd)])
339 successful steps 54 failed attempts 2868 function evaluations 13 partial derivatives 110 LU decompositions 774 solutions of linear systems FD-Jacobian: cputime = 2.0329
Integrate Brusselator problem using compressed finite-differencing
Finite-differencing can take advantage of sparsity when approximating the Jacobian. We set options to: specify the
Jacobian sparsity pattern, print out statistics and to turn on vectorization. We then time and run the calculation.
options = odeset('Vectorized','on','JPattern',S,'Stats','on');
t1=cputime;
[tfd,yfd] = ode15s(@brussode_f,tspan,y0,options,N);
cpucompfd=cputime-t1;
disp(['Compressed FD-Jacobian : cputime = ',num2str(cpucompfd)])
339 successful steps 54 failed attempts 840 function evaluations 13 partial derivatives 110 LU decompositions 774 solutions of linear systems Compressed FD-Jacobian : cputime = 0.91131
Use the MAD ODE Jacobian function - MADODEJacEval
First delete any existing settings for MADODE using MADODEDelete. Then register function brussode f for MADODEJacEval using MADODERegister. Then use odeset to specify that MADODEJacEval will calculate the required Jacobian. Finally time and run the calculation and use MADODEReport to provide information on the Jacobian technique used.
MADODEDelete % deletes any internal settings
MADODERegister(@brussode_f) % supplies handle to ODE function
options=odeset(options,'Jacobian',@MADODEJacEval);
t1=cputime;
[tmadjac,ymadjac]=ode15s(@brussode_f,tspan,y0,options,N);
cpumadjac=cputime-t1;
disp(['MAD Jacobian : cputime = ',num2str(cpumadjac)])
MADODEReport
339 successful steps 54 failed attempts 776 function evaluations 13 partial derivatives 110 LU decompositions 774 solutions of linear systems MAD Jacobian : cputime = 1.1416 MADODEreport Function name brussode_f Jacobian of size 160x160 Fwd AD - full Inf s Fwd AD - sparse 0.020029 s Fwd AD - compressed Inf s FWD AD - sparse selected Sparsity not fixed so compression rejected. Large system reject full method.
Specifying that the sparsity pattern is fixed
For this problem the Jacobian is large and sparse but MAD does not "know" that the sparsity pattern is fixed and so can only use sparse forward mode and not compression. We first delete any internal settings, specify that the sparsity pattern is fixed, i.e., it does not change from one evaluation to the next due to branching; and then use the MAD Jacobian. The Jacobian sparsity pattern is calculated using fmad's sparse forward mode for a point randomly perturbed about y0.
MADODEDelete % delete any internal settings
MADODERegister(@brussode_f,'sparsity_fixed','true')% specify fixed sparsity
options=odeset(options,'Jacobian',@MADODEJacEval); % Use MAD Jacobian
t1=cputime;
[tmadjac,ymadjac]=ode15s(@brussode_f,tspan,y0,options,N);
cpumadjac_fixed=cputime-t1;
disp(['MAD Jacobian (sparsity fixed): cputime = ',num2str(cpumadjac_fixed)])
MADODEReport
% Compressed forward mode should now have been used.
339 successful steps 54 failed attempts 776 function evaluations 13 partial derivatives 110 LU decompositions 774 solutions of linear systems MAD Jacobian (sparsity fixed): cputime = 1.1416 MADODEreport Function name brussode_f Jacobian of size 160x160 Fwd AD - full Inf s Fwd AD - sparse 0.020029 s Fwd AD - compressed 0.020029 s Percentage Sparsity = 2.4844% Maximum non-zeros/row = 4 of 160 Maximum non-zeros/col = 4 of 160 Size of compressed seed = 160x4 Percentage compressed to full seed = 2.5% FWD AD - compressed selected Large system reject full method. Sparse is worse than compressed - discarding Sparse.
Specifying the sparsity pattern
If available the sparsity pattern can be specified directly within MADsetupODE.
MADODEDelete % delete any internal settings
MADODERegister(@brussode_f,'sparsity_pattern',S)
options=odeset(options,'Jacobian',@MADODEJacEval);
t1=cputime;
[tmadjac,ymadjac]=ode15s(@brussode_f,tspan,y0,options,N);
cpumadjac_S=cputime-t1;
disp(['MAD Jacobian (sparsity given) : cputime = ',num2str(cpumadjac_S)])
MADODEReport
339 successful steps 54 failed attempts 776 function evaluations 13 partial derivatives 110 LU decompositions 774 solutions of linear systems MAD Jacobian (sparsity given) : cputime = 1.1216 MADODEreport Function name brussode_f Jacobian of size 160x160 Fwd AD - full Inf s Fwd AD - sparse 0.030043 s Fwd AD - compressed 0.010014 s Percentage Sparsity = 2.4844% Maximum non-zeros/row = 4 of 160 Maximum non-zeros/col = 4 of 160 Size of compressed seed = 160x4 Percentage compressed to full seed = 2.5% FWD AD - compressed selected Large system reject full method. Sparse is worse than compressed - discarding Sparse.
Difference in solutions
Of course, there should be no discernable difference between the finite difference and AD-enabled solutions.
plot(tfd,yfd(:,1),'-',tmadjac,ymadjac(:,1),'o')
xlabel('t')
ylabel('y')
legend('FD Jacobian','MAD Jacobian')
title('Comparison of Brusselator solutions using FD and AD for Jacobians')
Specifying a Jacobian technique
We can specify from the outset that compression is to be used.
MADODEDelete % delete any internal settings
MADODERegister(@brussode_f,'sparsity_pattern',S,'ad_technique','ad_fwd_compressed')
options=odeset(options,'Jacobian',@MADODEJacEval);
t1=cputime;
[tmadjac,ymadjac]=ode15s(@brussode_f,tspan,y0,options,N);
cpumadjac_comp=cputime-t1;
disp(['MAD Jacobian (compressed specified) : cputime = ',num2str(cpumadjac_S)])
MADODEReport
MADJacInternalRegister: AD technique selected is ad_fwd_compressed 339 successful steps 54 failed attempts 776 function evaluations 13 partial derivatives 110 LU decompositions 774 solutions of linear systems MAD Jacobian (compressed specified) : cputime = 1.1216 MADODEreport Function name brussode_f Jacobian of size 160x160 Fwd AD - compressed 0.010014 s Percentage Sparsity = 2.4844% Maximum non-zeros/row = 4 of 160 Maximum non-zeros/col = 4 of 160 Size of compressed seed = 160x4 Percentage compressed to full seed = 2.5% FWD AD - compressed selected AD_FWD_COMPRESSED specified by call to MADJacInternalRegister
Conclusions
For this problem compressed finite-differencing is most efficient. If the sparsity pattern is not known then sparse forward mode AD outperforms finite-differencing. On supplying the sparsity pattern the performance of compressed forward mode AD approaches that of compressed finite-differencing. Simply using the MADODEJacEval interface yields acceptable performance.
bar([cpufd,cpucompfd,cpumadjac,cpumadjac_fixed,cpumadjac_S,cpumadjac_comp])
colormap hsv
ylabel('CPU time (s)')
title(['Solving the Brusselator Problem with Different Jacobian Techniques (N= ',num2str(N),')'])
text(1,cpucompfd/2,'FD','Rotation',90)
text(2,cpucompfd/2,'Compressed FD (sparsity supplied)','Rotation',90)
text(3,cpucompfd/2,'MAD','Rotation',90)
text(4,cpucompfd/2,'MAD(sparsity fixed)','Rotation',90)
text(5,cpucompfd/2,'MAD(sparsity supplied)','Rotation',90)
text(6,cpucompfd/2,'MAD(compression specified)','Rotation',90)
The example MADEXpolluode provides a third example for these techniques.
Unconstrained Optimization
High-level interfaces to Mad's functionality for Matlab's Optimization Solvers have also been developed. In this section we cover unconstrained optimisation, in Section 6.3 we cover the constrained case.
Example MADEXbanana: Unconstrained Optimization Using High-Level Interfaces
This example illustrates how the gradient for an unconstrained optimization problem, the banana problem of bandem, may be calculated using the high-level interface function MADOptObjEval and used within fminunc.
See also MADOptObjEval, MADOptObjDelete, MADOptObjRegister, MADOptObjReport, fminunc, banana f, ba- nana fg, bandem
Contents
- Default BFGS optimization without gradient
- Use of exact gradient
- Use of MAD's high-level interface MADOptObjEval
- Information on the AD techiques used
Default BFGS optimization without gradient
If the gradient is not supplied then fminunc will approximate it using finite-differences.
format compact
x=[0 1];
options=optimset('LargeScale','off');
t1=cputime;
[xsol,fval,exitflag,output] = fminunc(@banana_f,x,options);
xsol
disp(['Default with no gradient took ',num2str(output.iterations),...
' iterations and CPU time = ',num2str(cputime-t1)])
Optimization terminated: relative infinity-norm of gradient less than options.TolFun. xsol = 1.0000 1.0000 Default with no gradient took 20 iterations and CPU time = 0.030043
Use of exact gradient
Since the function is just quadratic it may easily be differentiated by hand to give the analytic gradient in banana_fg.
options=optimset(options,'GradObj','on');
t1=cputime;
[xsol,fval,exitflag,output] = fminunc(@banana_fg,x,options);
xsol
disp(['Default with analytic gradient took ',num2str(output.iterations),...
' iterations and CPU time = ',num2str(cputime-t1)])
Optimization terminated: relative infinity-norm of gradient less than options.TolFun. xsol = 1.0000 1.0000 Default with analytic gradient took 20 iterations and CPU time = 0.020029
Use of MAD's high-level interface MADOptObjEval
Using MADOptObjEval is easy, we first delete any previous settings, register the objective function banana f and then use MADOptObjEval as the objective function which will automatically calculate the gradient when needed.
MADOptObjDelete
MADOptObjRegister(@banana_f)
options=optimset(options,'GradObj','on');
t1=cputime;
[xsol,fval,exitflag,output] = fminunc(@MADOptObjEval,x,options);
xsol
disp(['Jacobian by MADFandGradObjOpt took ',num2str(output.iterations),...
' iterations and CPU time = ',num2str(cputime-t1)])
Optimization terminated: relative infinity-norm of gradient less than options.TolFun. xsol = 1.0000 1.0000 Jacobian by MADFandGradObjOpt took 20 iterations and CPU time = 0.09013
Information on the AD techiques used
The function MADOptObjReport displays information on the techniques used. In this case since there are only 2 independent variables forward mode with full storage is used.
MADOptObjReport
MADOptObjReportODE Function name banana_f Jacobian of size 1x2 Fwd AD - full 0.010014 s Fwd AD - compressed Inf s FWD AD - full selected MADJacInternalEval: n< MADMinSparseN=10 so using forward mode full
Constrained Optimization
If we have a constrained optimization problem then we use MADOptObjEval for the objective, as in Example 6.3, and MADOptConstrEval for the constraint as Example 6.4 demonstrates.
Example MADEXconfun: Constrained Optimization Using High-Level
Interfaces.
This example shows how to use MAD's high-level interface functions to obtain both the objective function gradient and the constraint function Jacobian in a constrained optimisation problem solved using MATLAB's fmincon function.
See also MADEXbanana, MADOptConstrRegister, MADOptConstrEval, MADOptConstrReport, MADOptCon- strDelete, MADOptObjRegister, MADOptObjEval, MADOptObjReport, MADOptObjDelete
Contents
- Default optimization without gradients
- Using analytic gradients
- Using MAD's high-level interfaces
- Seeing which AD techniques are used
Default optimization without gradients
Here we simply use the medium-scale algorithm with gradients evaluated by default finite-differencing.
options = optimset('LargeScale','off');
x0 = [-1 1];
t1=cputime;
[x,fval,exitflag,output] = fmincon(@objfun,x0,[],[],[],[],[],[],@confun,options);
t2=cputime-t1;
disp(['Default fmincon took ',num2str(output.iterations),...
' iterations and cpu= ',num2str(t2)])
Optimization terminated: first-order optimality measure less than options.TolFun and maximum constraint violation is less than options.TolCon. Active inequalities (to within options.TolCon = 1e-006): lower upper ineqlin ineqnonlin 1 2 Default fmincon took 8 iterations and cpu= 0.040058
Using analytic gradients
The Mathworks supply functions objfungrad and confungrad which return the objective's gradient and constraint's
Jacobian when needed can be used for analytic evaluation of the required derivatives.
options = optimset(options,'GradObj','on','GradConstr','on');
t1=cputime;
[x,fval,exitflag,output] = fmincon('objfungrad',x0,[],[],[],[],[],[],...
'confungrad',options);
t2=cputime-t1;
disp(['Hand-coded derivs in fmincon took ',num2str(output.iterations),...
' iterations and cpu= ',num2str(t2)])
Optimization terminated: first-order optimality measure less than options.TolFun and maximum constraint violation is less than options.TolCon. Active inequalities (to within options.TolCon = 1e-006): lower upper ineqlin ineqnonlin 1 2 Hand-coded derivs in fmincon took 8 iterations and cpu= 0.040058
Using MAD's high-level interfaces
Calculating the gradients and Jacobians using MAD's high-level interfaces is easy, we first use MADOptObjDelete and MADOptConstrDelete to delete any old settings, then register the objective and constraint functions using MADOptObjRegister and MADOptConstrRegister, then optimize using MADOptObjEval and MADOptConstrEval to supply the objective and constraints, and when needed their derivatives.
MADOptObjDelete
MADOptConstrDelete
MADOptObjRegister(@objfun)
MADOptConstrRegister(@confun)
t1=cputime;
[x,fval,exitflag,output] = fmincon(@MADOptObjEval,x0,[],[],[],[],[],[],...
@MADOptConstrEval,options);
t2=cputime-t1;
disp(['MAD driver in fmincon took ',num2str(output.iterations),...
' iterations and cpu= ',num2str(t2)])
Optimization terminated: first-order optimality measure less than options.TolFun and maximum constraint violation is less than options.TolCon. Active inequalities (to within options.TolCon = 1e-006): lower upper ineqlin ineqnonlin 1 2 MAD driver in fmincon took 8 iterations and cpu= 0.10014
Seeing which AD techniques are used
The functions MADOptObjReport and MADOptConstrReport report back which techniques have been used and why. Here full mode is used since the number of derivatives is small.
MADOptObjReport MADOptConstrReport
MADOptObjReportODE Function name objfun Jacobian of size 1x2 Fwd AD - compressed Inf s FWD AD - full selected MADJacInternalEval: n< MADMinSparseN=10 so using forward mode full MADOptConstrReport Function name confun Jacobian of size 2x2 Fwd AD - full 0.010014 s Fwd AD - compressed Inf s FWD AD - full selected MADJacInternalEval: n< MADMinSparseN=10 so using forward mode full