Symbolic derivatives with tomSym

From TomWiki
Jump to navigationJump to search

The main strength of tomSym is its ability to automatically and quickly compute symbolic derivatives of matrix expressions. These derivatives can then be converted into efficient Matlab code.


The matrix derivative of a matrix function is a fourth rank tensor - that is, a matrix where each of the entries is a matrix. Rather than using four-dimensional matrices to represent this, tomSym continues to work in two dimensions. This makes it possible to take advantage of the very efficient handling of sparse matrices in Matlab, which is not available for higher-dimensional matrices.

In order for the derivative to be two-dimensional, tomSym's derivative reduces its arguments to one-dimensional vectors before the derivative is computed. In the returned J, each row corresponds to an element of F, and each column corresponds to an element of X. As usual in Matlab, the elements of a matrix are taken in column-first order.

derivative(F,X) == derivative(F(:),X(:))

For vectors f and x, the resulting J is the well-known Jacobian matrix.

Difference from commonly used mathematical notation

The notation used in tomSym was chosen to minimize the amount of element reordering needed to compute gradients for common expressions in optimization problems. It needs to be pointed out that this is different from the commonly used mathematical notation, where the tensor

is flattened into a two-dimensional matrix as it is written. (There are actually two variations of this in common use — the indexing of the elements of X may or may not be transposed.)

For example, in common mathematical notation, the so-called self derivative matrix is a mn-by-mn square (or mm-by-nn rectangular in the non-transposed variation) matrix containing mn ones spread out in a random-looking manner. In tomSym notation, the self-derivative matrix is the mn-by-mn identity matrix.

The difference in notation only involves the ordering of the elements, and reordering the elements to a different notational convention should be trivial if tomSym is used to generate derivatives for applications other than TOMLAB optimization.

How derivatives are generated

TomSym's derivative and derivatives functions generates symbolic Jacobians for each subexpression, and then multiplies these together using the chain rule. (In tomSym notation, derivative(A,A) is the identity matrix, so the chain rule applies using normal matrix multiplication.) Some simplifications are performed along the way, in order to reduce memory requirements and improve speed.

The accumulation can be done in either "forwards" or "backwards" mode. The result is mathematically equivalent using either mode, but the memory requirements may be different.

Derivatives are represented using matrix operations on matrix symbols. This yields much faster code compared to other symbolic software which represents each element of a matrix with its own symbol. TomSym is routinely used to solve optimization problems involving thousands of variables, and matrices with millions of elements.


>> toms     y
>> toms 3x1 x
>> toms 2x3 A
>> f = (A*x).^(2*y)
f = tomSym(2x1):
>> g = derivative(f,A)
g = tomSym(2x6):

In the above example, the 2x1 symbol f is differentiated w.r.t the 2x3 symbol A. The result is a 2x6 matrix, representing d(vec(f))/d(vec(A)).

The displayed text is not necessarily identical to the m-code that will be generated from an expression. For example, the identity matrix is generated using speye in m-code, but displayed as eye. (Derivatives tend to involve many sparse matrices, which Matlab handles efficiently.) Also, subexpressions that occur multiple times are broken out and computed only once. The mcode command converts a tomSym object to a matlab code string.

>> mcode(g)
ans =
tempC2   = 2*y;
out      = scalerows(tempC2*(A*x).^(tempC2-1),kron(x',speye(2)));


There are two other forms of derivatives available in TOMLAB: Automatic Derivatives using MAD, and finite differences. It is also possible to mix the techniques by using tomSym as the top layer, and using MAD or finite differences to compute the derivatives of a subexpression using the wrap function.

The main advantage of tomSym over MAD and finite differeces is the execution speed once the code has been generated, but tomSym cannot handle all kinds of functions efficiently. With functions containing large for-loops, it is often better to use one of the other alternatives.