# MAD High-Level Interfaces to Matlab’s ODE and Optimization Solvers

In many cases users wish to use AD for calculating derivatives necessary for ODE or optimisation solvers. When using the TOMLAB optimisation solvers the use of AD is easily achieved by setting the flags, Prob.NumDiff and Prob.ConsDiff as noted in Section 3. When solving stiff ODE's using Matlab's ODE solvers or solving optimization problems using Matlab's Optimization Toolbox functions the user can either:

• Directly use calls to Mad's functions to calculate required derivatives as detailed in Sections 4-5.
• Or use the facilities of Mad's high-level interfaces [FK04] to the ODE and optimization solvers as detailed in this section.

For a vector-valued function F(x), $x \in R^n$, $F(x) \in R$ Mad currently provides 3 techniques for computing the Jacobian JF,

Forward mode AD with full storage (see Section 4) - this is most likely to be efficient when n is small.

Forward mode AD with sparse storage (see Section 5.3) - this is most likely to be efficient for larger n, particularly if the Jacobian is sparse. Even if the Jacobian is dense, then provided n is sufficiently large this method is likely to outperform full storage because there is always sparsity present in the early stages of a Jacobian calculation. For small n this technique is likely to be slower than full storage because of the overhead of manipulating a sparse data structure.

Forward mode AD with compressed storage (see Section 5.4) - this is likely to be most efficient for large n when the Jacobian is sparse and has a favourable sparsity pattern (e.g., diagonal, banded) in which case the number of directional derivatives required to compute the Jacobian may be reduced to p < n with p dependent on the sparsity pattern and bounded below by the maximum number of nonzeros in a row of the Jacobian. Note that sparse storage always uses fewer floating-point operations than compressed storage - but provided p is sufficiently small then compression outperforms sparse storage. This method requires the sparsity pattern of the Jacobian (or a safe over-estimate of the sparsity pattern) to be known. If the sparsity pattern is unknown but is known to be fixed then the sparsity pattern may be estimated using a Jacobian calculation with sparse storage and randomly perturbed x (to reduce the chance of unfortuitous cancellation giving a Jacobian entry of zero when in general it is non-zero). If the sparsity pattern changes between evaluations of F (perhaps because of branching) then either compression should not be used or the supplied Jacobian pattern must be an over-estimate to cover any possible pattern.

In general it is not possible to predict a priori which technique will be most efficient for any particular problem since this depends on many factors: sparsity pattern of JF; relative size of n, m and when compression is possible p; efficiency of sparse data access; performance of the computer being used. A rational approach is to try each technique (except default to full storage for small n and reject full storage for large n), timing its evaluation and choosing the most efficient. To make this easy for the user this approach has been implemented within the function MADJacInternalEval and interfaces to MADJacInternalEval for ODE or optimisation solvers have been provided. Consequently user's need never access MADJacInternalEval directly and needn't bother themselves with the exact AD technique being used confident that a reasonably efficient technique for their problem has been automatically selected. Of course, this selection process has an associated overhead so the interface allows the user to mandate a technique if they so wish, perhaps based on experiments conducted previously.

## Stiff ODE Solution

When solving the ordinary differential equation (ODE)

dy / dt = F(x,y,p1,p2,...) = y(t = 0) = y0

for $Y \in R$ where p1,p2,... are fixed parameters using Matlab's stiff 3 ODE solvers ode15s, ode23s, ode23t, and ode23tb then it is necessary to calculate, or at least approximate accurately, the Jacobian,

JF(t,y,p1,p2,...) = δ / δy * F(t,y,p1,p2,...),

for the solvers to use a quasi-Newton solve within their implicit algorithms. By default such Jacobians are approximated by finite-differences. If the Jacobian is sparse and the sparsity pattern supplied then Jacobian compression is used [SR97] (c.f. Section 5.4). Using MADODEJacEval we may easily take advantage of AD.

### Example MADEXvdpode: Jacobian driver for the Van der Pol ODE.

This example shows how to use MAD's ODE Jacobian function MADJacODE to solve the stiff Van der Pol ODE problem.

#### Contents

• Set up the ODE problem
• Integrate using finite-differencing for the Jacobian
• Use the analytic Jacobian
• Conclusions

#### Set up the ODE problem

First we define:

MU = 1000;                    % default stiffness parameter for Van der Pol problem
tspan = [0; max(20,3*MU)];    % integration timespan of several periods
y0 = [2; 0];                  % initial conditions
options=odeset('Stats','on'); % display ODE statistics

#### Integrate using finite-differencing for the Jacobian

Using the default finite-differencing a solution is easily found.

t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
tfd=cputime-t1;
disp(['Jacobian by Finite-Differencing - CPU time = ',num2str(tfd)])
591 successful steps
225 failed attempts
1883 function evaluations
45 partial derivatives
289 LU decompositions
1747 solutions of linear systems
Jacobian by Finite-Differencing - CPU time = 0.75108


#### Use the analytic Jacobian

Since the Jacobian is recalculated many times supplying an analytic Jacobian should improve performance.

options = odeset(options,'Jacobian',@vdpode_J);
t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
tjac=cputime-t1;
disp(['Jacobian by hand-coding - CPU time = ',num2str(tjac)])
591 successful steps
225 failed attempts
1749 function evaluations
45 partial derivatives
289 LU decompositions
1747 solutions of linear systems
Jacobian by hand-coding - CPU time = 0.75108


The automated MAD Jacobian calculation avoids the need for the user to supply the Jacobian. First we use odeset to specify that the Jacobian will be calculated by MADODEJacEval. We then use MADODEDelete to first clear any existing settings for vdpode f, then MADODERegister register the function and then integrate.

options = odeset(options,'Jacobian',@MADODEJacEval);
MADODEDelete; % deletes all previous settings for ODE evaluation
MADODERegister(@vdpode_f); % all we must do is supply the ODE function handle
t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
disp(['Jacobian by MADODEJacEval                      - CPU time = ',num2str(tmadjac)])
591 successful steps
225 failed attempts
1749 function evaluations
45 partial derivatives
289 LU decompositions
1747 solutions of linear systems
Jacobian by MADODEJacEval                      - CPU time = 0.88127


On a second use (''without ''calling MADODEDelete) MADODEJacEval should be slightly faster since it "remembers"

which technique to use to calculate the Jacobian.

t1=cputime;
[t,y] = ode15s(@vdpode_f,tspan,y0,options,MU);
disp(['Repeat of Jacobian by MADODEJacEval            - CPU time = ',num2str(tmadjac2)])
591 successful steps
225 failed attempts
1749 function evaluations
45 partial derivatives
289 LU decompositions
1747 solutions of linear systems
Repeat of Jacobian by MADODEJacEval            - CPU time = 0.91131


Using MADODEReport we see which technique is used and why - here full storage is used due to the small size (2 × 2) of the Jacobian required.

MADODEReport % report on which technique used
MADODEreport
Function name vdpode_f
Jacobian of size 2x2
Fwd AD - full       0.010014 s
Fwd AD - compressed Inf s


#### Conclusions

Using MAD for such small ODE problems gives similar performance to finite-differencing. MAD is more useful for large ODE problems, particularly those with sparse Jacobians in, for example the Brusselator problem of MADEXbrussode.

bar([tfd,tjac,tmadjac,tmadjac2])
colormap hsv
ylabel('CPU time (s)')
title('Solving the Van der Pol Problem with Different Jacobian Techniques')
text(1,tfd/2,'FD','Rotation',90)
text(2,tfd/2,'Analytic Jacobian','Rotation',90)
text(4,tfd/2,'Repeat of MADJacODE','Rotation',90)

Using Mad for such small ODE problems gives similar performance to finite-differencing. Mad is more useful for large ODE problems, particularly those with sparse Jacobians in, for example the Brusselator problem of MADEXbrussode.

Example MADEXbrussode: Jacobian driver for the Brusselator ODE.

This example of MADEXbrussode shows how to use MAD's high-level ODE Jacobian function MADODEJacEval to solve the stiff ODE Brusselator problem.

#### Contents

• Set up the ODE problem
• Integrate Brusselator problem using finite-differencing for the Jacobian
• Integrate Brusselator problem using compressed finite-differencing
• Specifying that the sparsity pattern is fixed
• Specifying the sparsity pattern
• Difference in solutions
• Specifying a Jacobian technique
• Conclusions

#### Set up the ODE problem

First we define:

N = 80; % number of mesh points,
tspan = [0; 50]; % timespan of integration,
t=tspan(1);      % initial time,
y0 = [1+sin((2*pi/(N+1))*(1:N)); repmat(3,1,N)];
y0=reshape(y0,[2*N 1]); % initial conditions. and sparsity pattern
S=brussode_S(N); % find sparsity pattern.

Integrate Brusselator problem using finite-differencing for the Jacobian

Initially we use finite-differences to approximate the Jacobian ignoring any sparsity, timing and running the cal- culation.

options = odeset('Vectorized','on','Stats','on');
t1=cputime;
[tfd,yfd] = ode15s(@brussode_f,tspan,y0,options,N);
cpufd=cputime-t1;
disp(['FD-Jacobian              : cputime = ',num2str(cpufd)])
339 successful steps
54 failed attempts
2868 function evaluations
13 partial derivatives
110 LU decompositions
774 solutions of linear systems
FD-Jacobian: cputime = 2.0329


#### Integrate Brusselator problem using compressed finite-differencing

Finite-differencing can take advantage of sparsity when approximating the Jacobian. We set options to: specify the

Jacobian sparsity pattern, print out statistics and to turn on vectorization. We then time and run the calculation.

options = odeset('Vectorized','on','JPattern',S,'Stats','on');
t1=cputime;
[tfd,yfd] = ode15s(@brussode_f,tspan,y0,options,N);
cpucompfd=cputime-t1;
disp(['Compressed FD-Jacobian   : cputime = ',num2str(cpucompfd)])
339 successful steps
54 failed attempts
840 function evaluations
13 partial derivatives
110 LU decompositions
774 solutions of linear systems
Compressed FD-Jacobian   : cputime = 0.91131


First delete any existing settings for MADODE using MADODEDelete. Then register function brussode f for MADODEJacEval using MADODERegister. Then use odeset to specify that MADODEJacEval will calculate the required Jacobian. Finally time and run the calculation and use MADODEReport to provide information on the Jacobian technique used.

MADODEDelete % deletes any internal settings
MADODERegister(@brussode_f)               % supplies handle to ODE function
t1=cputime;
MADODEReport
339 successful steps
54 failed attempts
776 function evaluations
13 partial derivatives
110 LU decompositions
774 solutions of linear systems
MAD Jacobian       : cputime = 1.1416
Function name brussode_f
Jacobian of size 160x160
Fwd AD - full       Inf s
Fwd AD - sparse     0.020029 s
Fwd AD - compressed Inf s
Sparsity not fixed so compression rejected. Large system reject full method.


#### Specifying that the sparsity pattern is fixed

For this problem the Jacobian is large and sparse but MAD does not "know" that the sparsity pattern is fixed and so can only use sparse forward mode and not compression. We first delete any internal settings, specify that the sparsity pattern is fixed, i.e., it does not change from one evaluation to the next due to branching; and then use the MAD Jacobian. The Jacobian sparsity pattern is calculated using fmad's sparse forward mode for a point randomly perturbed about y0.

MADODEDelete % delete any internal settings
t1=cputime;
% Compressed forward mode should now have been used.
339 successful steps
54 failed attempts
776 function evaluations
13 partial derivatives
110 LU decompositions
774 solutions of linear systems
MAD Jacobian (sparsity fixed): cputime = 1.1416
Function name brussode_f
Jacobian of size 160x160
Fwd AD - full       Inf s
Fwd AD - sparse     0.020029 s
Fwd AD - compressed 0.020029 s
Percentage Sparsity    = 2.4844%
Maximum non-zeros/row  = 4 of 160
Maximum non-zeros/col  = 4 of 160
Size of compressed seed = 160x4
Percentage compressed to full seed    = 2.5%
Large system reject full method. Sparse is worse than compressed - discarding Sparse.


#### Specifying the sparsity pattern

If available the sparsity pattern can be specified directly within MADsetupODE.

MADODEDelete % delete any internal settings
t1=cputime;
MADODEReport
339 successful steps
54 failed attempts
776 function evaluations
13 partial derivatives
110 LU decompositions
774 solutions of linear systems
MAD Jacobian (sparsity given) : cputime = 1.1216
Function name brussode_f
Jacobian of size 160x160
Fwd AD - full       Inf s
Fwd AD - sparse     0.030043 s
Fwd AD - compressed 0.010014 s
Percentage Sparsity    = 2.4844%
Maximum non-zeros/row  = 4 of 160
Maximum non-zeros/col  = 4 of 160
Size of compressed seed = 160x4
Percentage compressed to full seed    = 2.5%
Large system reject full method. Sparse is worse than compressed - discarding Sparse.


#### Difference in solutions

Of course, there should be no discernable difference between the finite difference and AD-enabled solutions.

plot(tfd,yfd(:,1),'-',tmadjac,ymadjac(:,1),'o')
xlabel('t')
ylabel('y')
title('Comparison of Brusselator solutions using FD and AD for Jacobians')

#### Specifying a Jacobian technique

We can specify from the outset that compression is to be used.

MADODEDelete % delete any internal settings
t1=cputime;
MADODEReport
MADJacInternalRegister: AD technique selected is ad_fwd_compressed
339 successful steps
54 failed attempts
776 function evaluations
13 partial derivatives
110 LU decompositions
774 solutions of linear systems
MAD Jacobian (compressed specified) : cputime = 1.1216
Function name brussode_f
Jacobian of size 160x160
Fwd AD - compressed 0.010014 s
Percentage Sparsity    = 2.4844%
Maximum non-zeros/row  = 4 of 160
Maximum non-zeros/col  = 4 of 160
Size of compressed seed = 160x4
Percentage compressed to full seed    = 2.5%


#### Conclusions

For this problem compressed finite-differencing is most efficient. If the sparsity pattern is not known then sparse forward mode AD outperforms finite-differencing. On supplying the sparsity pattern the performance of compressed forward mode AD approaches that of compressed finite-differencing. Simply using the MADODEJacEval interface yields acceptable performance.

bar([cpufd,cpucompfd,cpumadjac,cpumadjac_fixed,cpumadjac_S,cpumadjac_comp])
colormap hsv
ylabel('CPU time (s)')
title(['Solving the Brusselator Problem with Different Jacobian Techniques (N= ',num2str(N),')'])
text(1,cpucompfd/2,'FD','Rotation',90)
text(2,cpucompfd/2,'Compressed FD (sparsity supplied)','Rotation',90)
text(6,cpucompfd/2,'MAD(compression specified)','Rotation',90)

The example MADEXpolluode provides a third example for these techniques.

## Unconstrained Optimization

High-level interfaces to Mad's functionality for Matlab's Optimization Solvers have also been developed. In this section we cover unconstrained optimisation, in Section 6.3 we cover the constrained case.

### Example MADEXbanana: Unconstrained Optimization Using High-Level Interfaces

This example illustrates how the gradient for an unconstrained optimization problem, the banana problem of bandem, may be calculated using the high-level interface function MADOptObjEval and used within fminunc.

#### Contents

• Default BFGS optimization without gradient
• Information on the AD techiques used

#### Default BFGS optimization without gradient

If the gradient is not supplied then fminunc will approximate it using finite-differences.

format compact
x=[0 1];
options=optimset('LargeScale','off');
t1=cputime;
[xsol,fval,exitflag,output] = fminunc(@banana_f,x,options);
xsol
disp(['Default with no gradient  took ',num2str(output.iterations),...
' iterations and CPU time = ',num2str(cputime-t1)])
Optimization terminated: relative infinity-norm of gradient less than options.TolFun.
xsol =
1.0000    1.0000
Default with no gradient  took 20 iterations and CPU time = 0.030043


Since the function is just quadratic it may easily be differentiated by hand to give the analytic gradient in banana_fg.

options=optimset(options,'GradObj','on');
t1=cputime;
[xsol,fval,exitflag,output] = fminunc(@banana_fg,x,options);
xsol
disp(['Default with analytic gradient took ',num2str(output.iterations),...
' iterations and CPU time = ',num2str(cputime-t1)])
Optimization terminated: relative infinity-norm of gradient less than options.TolFun.
xsol =
1.0000    1.0000
Default with analytic gradient took 20 iterations and CPU time = 0.020029


Using MADOptObjEval is easy, we first delete any previous settings, register the objective function banana f and then use MADOptObjEval as the objective function which will automatically calculate the gradient when needed.

MADOptObjDelete
t1=cputime;
xsol
' iterations and CPU time = ',num2str(cputime-t1)])
Optimization terminated: relative infinity-norm of gradient less than options.TolFun.
xsol =
1.0000    1.0000


#### Information on the AD techiques used

The function MADOptObjReport displays information on the techniques used. In this case since there are only 2 independent variables forward mode with full storage is used.

MADOptObjReport

MADOptObjReportODE
Function name banana_f
Jacobian of size 1x2
Fwd AD - full       0.010014 s
Fwd AD - compressed Inf s


## Constrained Optimization

If we have a constrained optimization problem then we use MADOptObjEval for the objective, as in Example 6.3, and MADOptConstrEval for the constraint as Example 6.4 demonstrates.

### Example MADEXconfun: Constrained Optimization Using High-Level

#### Interfaces.

This example shows how to use MAD's high-level interface functions to obtain both the objective function gradient and the constraint function Jacobian in a constrained optimisation problem solved using MATLAB's fmincon function.

#### Contents

• Seeing which AD techniques are used

Here we simply use the medium-scale algorithm with gradients evaluated by default finite-differencing.

options = optimset('LargeScale','off');
x0 = [-1 1];
t1=cputime;
[x,fval,exitflag,output] = fmincon(@objfun,x0,[],[],[],[],[],[],@confun,options);
t2=cputime-t1;
disp(['Default fmincon took ',num2str(output.iterations),...
' iterations and cpu= ',num2str(t2)])
Optimization terminated: first-order optimality measure less
than options.TolFun and maximum constraint violation is less
than options.TolCon.
Active inequalities (to within options.TolCon = 1e-006):
lower      upper     ineqlin   ineqnonlin
1
2
Default fmincon took 8 iterations and cpu= 0.040058


Jacobian when needed can be used for analytic evaluation of the required derivatives.

options = optimset(options,'GradObj','on','GradConstr','on');
t1=cputime;
t2=cputime-t1;
disp(['Hand-coded derivs in fmincon took ',num2str(output.iterations),...
' iterations and cpu= ',num2str(t2)])
Optimization terminated: first-order optimality measure less
than options.TolFun and maximum constraint violation is less
than options.TolCon.
Active inequalities (to within options.TolCon = 1e-006):
lower      upper     ineqlin   ineqnonlin
1
2
Hand-coded derivs in fmincon took 8 iterations and cpu= 0.040058


Calculating the gradients and Jacobians using MAD's high-level interfaces is easy, we first use MADOptObjDelete and MADOptConstrDelete to delete any old settings, then register the objective and constraint functions using MADOptObjRegister and MADOptConstrRegister, then optimize using MADOptObjEval and MADOptConstrEval to supply the objective and constraints, and when needed their derivatives.

MADOptObjDelete
t1=cputime;
t2=cputime-t1;
disp(['MAD driver in fmincon took ',num2str(output.iterations),...
' iterations and cpu= ',num2str(t2)])
Optimization terminated: first-order optimality measure less
than options.TolFun and maximum constraint violation is less
than options.TolCon.
Active inequalities (to within options.TolCon = 1e-006):
lower      upper     ineqlin   ineqnonlin
1
2
MAD driver in fmincon took 8 iterations and cpu= 0.10014


#### Seeing which AD techniques are used

The functions MADOptObjReport and MADOptConstrReport report back which techniques have been used and why. Here full mode is used since the number of derivatives is small.

MADOptObjReport

MADOptObjReportODE
Function name objfun
Jacobian of size 1x2
Fwd AD - compressed Inf s
Function name confun
Jacobian of size 2x2
Fwd AD - full       0.010014 s
Fwd AD - compressed Inf s