Symbolic derivatives with tomSym: Difference between revisions

From TomWiki
Jump to navigationJump to search
(Initial import)
 
Line 11: Line 11:
For vectors <code>f</code> and <code>x</code>, the resulting <code>J</code> is the well-known Jacobian matrix.
For vectors <code>f</code> and <code>x</code>, the resulting <code>J</code> is the well-known Jacobian matrix.


=== TomSym notation is slightly different from commonly used mathematical notation ===
== Differece from commonly used mathematical notation ==


The notation used in tomSym was chosen to minimize the amount of element reordering needed to compute gradients for common expressions in optimization problems. It needs to be pointed out that this is ''different'' from the commonly used mathematical notation, where the tensor
The notation used in tomSym was chosen to minimize the amount of element reordering needed to compute gradients for common expressions in optimization problems. It needs to be pointed out that this is ''different'' from the commonly used mathematical notation, where the tensor


:<math>\frac{\partial\mathbf{F}} {\partial\mathbf{X}}=
<math>\frac{\partial\mathbf{F}} {\partial\mathbf{X}}=
\begin{bmatrix}
\begin{bmatrix}
\frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\
\frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\

Revision as of 02:31, 28 July 2011

The main strength of tomSym is its ability to automatically and quickly compute symbolic derivatives of matrix expressions. The derivatives can then be converted into efficient Matlab code.

Notation

The matrix derivative of a matrix function is a fourth rank tensor - that is, a matrix where each of the entries is a matrix. Rather than using four-dimensional matrices to represent this, tomSym continues to work in two dimensions. This makes it possible to take advantage of the very efficient handling of sparse matrices in Matlab, which is not available for higher-dimensional matrices.

In order for the derivative to be two-dimensional, tomSym's derivative reduces its arguments to one-dimensional vectors before the derivative is computed. In the returned J, each row corresponds to an element of F, and each column corresponds to an element of X. As usual in Matlab, the elements of a matrix are taken in column-first order.

derivative(F,X) == derivative(F(:),X(:))

For vectors f and x, the resulting J is the well-known Jacobian matrix.

Differece from commonly used mathematical notation

The notation used in tomSym was chosen to minimize the amount of element reordering needed to compute gradients for common expressions in optimization problems. It needs to be pointed out that this is different from the commonly used mathematical notation, where the tensor

is flattened into a two-dimensional matrix as it is written. (There are actually two variations of this in common use — the indexing of the elements of X may or may not be transposed.)

For example, in common mathematical notation, the so-called self derivative matrix is a mn-by-mn square (or mm-by-nn rectangular in the non-transposed variation) matrix containing mn ones spread out in a random-looking manner. In tomSym notation, the self-derivative matrix is the mn-by-mn identity matrix.

The difference in notation only involves the ordering of the elements, and reordering the elements to a different notational convention should be trivial if tomSym is used to generate derivatives for applications other than TOMLAB optimization.

Example

>> toms     y
>> toms 3x1 x
>> toms 2x3 A
>> f = (A*x).^(2*y)
 
f = tomSym(2x1):
 
   (A*x).^(2*y)
 
>> derivative(f,A)
 
ans = tomSym(2x6):
 
   (2*y)*setdiag((A*x).^(2*y-1))*kron(x',eye(2))

In the above example, the 2x1 symbol f is differentiated w.r.t the 2x3 symbol A. The result is a 2x6 matrix, representing d(vec(f))/d(vec(A)).

The displayed text is not necessarily identical to the m-code that will be generated from an expression. For example, the identity matrix is generated using speye in m-code, but displayed as eye. (Derivatives tend to involve many sparse matrices, which Matlab handles efficiently.) The mcodestr command converts a tomSym object to a matlab code string.

>> mcodestr(ans)
ans =
(2*y)*setdiag((A*x).^(2*y-1))*kron(x',speye(2))