CGO rbfSolve description

This page is part of the CGO Manual. See CGO Manual.

{{#switch: | left =

{{#switch:{{#if: | {{{smallimage}}} | }} | none =

| #default =

}} {{#if:{{#if: | {{{smallimageright}}} | }} | {{#ifeq:{{#if: | {{{smallimageright}}} | }}|none | | }} }}

{{

 #switch:left
 | left =

| #default =

 }}
 {{#if:{{#if:
                | {{{smallimage}}}
                | 
                }}
 | {{#if:
                | {{{smallimage}}}
                | 
                }}
 | [[File:{{#switch:caution
   | critical   = Ambox speedy deletion.png
   | important  = Ambox deletion.png
   | warning    = Ambox content.png
   | caution    = Cleanup.png
   | move       = Ambox move.png
   | protection = Ambox protection.png
   | notice          
   | #default   = Cleanup.png
   }} | {{#switch:left 
     | left = 20x20px 
     | #default = 40x40px 
     }} |link=|alt=]]
 }}{{#switch:left
 | left =

| #default =

}}

{{#if:

                | {{{smalltext}}} 
                | Cleanup is needed
Cross-referencing of the equations.

}}

{{#switch:left

   | left = {{#if:
                | {{{smallimageright}}}
                | 
                }}

| #default =

{{#if:

                | {{{smallimageright}}}
                |

}}

| #default =

{{#switch: | none =

| #default =

}}

{{#if: | {{#ifeq:|none

| }} }}

{{

 #switch:
 | left =

| #default =

 }}
 {{#if:
 | 
 | [[File:{{#switch:caution
   | critical   = Ambox speedy deletion.png
   | important  = Ambox deletion.png
   | warning    = Ambox content.png
   | caution    = Cleanup.png
   | move       = Ambox move.png
   | protection = Ambox protection.png
   | notice          
   | #default   = Cleanup.png
   }} | {{#switch: 
     | left = 20x20px 
     | #default = 40x40px 
     }} |link=|alt=]]
 }}{{#switch:
 | left =

| #default =

}}

Cleanup is needed
Cross-referencing of the equations.

{{#switch:

   | left =

| #default =

}}

}} Following is a detailed description of the rbfSolve algorithm.

Summary

The manual considers global optimization of costly objective functions, i.e. the problem of finding the global minimum when there are several local minima and each function value takes considerable CPU time to compute. Such problems often arise in industrial and financial applications, where a function value could be a result of a time- consuming computer simulation or optimization. Derivatives are most often hard to obtain, and the algorithms presented make no use of such information.

The emphasis is on a new method by Gutmann and Powell, A radial basis function method for global optimization. This method is a response surface method, similar to the Efficient Global Optimization (EGO) method of Jones. The TOMLAB implementation of the Radial Basis Function (RBF) method is described in detail.

Introduction

The task of global optimization is to find the set of parameters x in the feasible region $\Omega \subset R^{d}$ for which the objective function f(x) obtains its smallest value. In other words, a point $x^{\ast }$ is a global optimizer to f (x) on $\Omega$ , if $f(x^{\ast })\leq f(x)$ . On the other hand, a point ${\hat {x}}$ is a local optimizer to f (x), if f (x) = f (x) for all x in some neighborhood around x. Obviously, when the objective function has several local minima, there could be solutions that are locally optimal but not globally optimal and standard local optimization techniques are likely to get stuck before the global minimum is reached. Therefore, some kind of global search is needed to find the global minimum with some reliability.

Previously a Matlab implementations of the DIRECT has been made, the new constrained DIRECT and the Efficient Global Optimization (EGO) algorithms. The implementations are part of the TOMLAB optimization environment. The implementation of the DIRECT algorithm is further discussed and analyzed in Bjorkman, Holmström. Since the objective functions in our applications often are expensive to compute, we have to focus on very efficient methods. At the IFIP TC7 Conference on System Modelling and Optimization in Cambridge 1999, Hans-Martin Gutmann presented his work on the RBF algorithm. The idea of the RBF algorithm is to use radial basis function interpolation to define a utility function (Powell). The next point, where the original objective function should be evaluated, is determined by optimizing on this utility function. The combination of our need for efficient global optimization software and the interesting ideas of Powell and Gutmann led to the development of an improved RBF algorithm implemented in Matlab.

The RBF Algorithm

Our RBF algorithm is based on the ideas presented by Gutmann, with some extensions and further development. The algorithm is implemented in the Matlab routine rbfSolve.

The RBF algorithm deals with problems of the form

${\begin{array}{ll}\min \limits _{x}&f(x)\\\mathrm {s.t.} &{\begin{array}{ccccc}x_{L}&\leq &x&\leq &x_{U}\end{array}},\end{array}}$

where $f\in R$ and $x,x_{L},x_{U}\in R^{d}$ . We assume that no derivative information is available and that each function evaluation is very expensive. For example, the function value could be the result of a time-consuming experiment or computer simulation.

Description of the Algorithm

We now consider the question of choosing the next point where the objective function should be evaluated. The idea of the RBF algorithm is to use radial basis function interpolation and a measure of 'bumpiness' of a radial function, σ say. A target value $f_{n}^{*}$ is chosen that is an estimate of the global minimum of f . For each $y\neq \{x_{1},...,x_{n}\}$ there exists a radial basis function that satisfies the interpolation conditions

$s_{y}(x_{i})=f(x_{i}),\quad i=1,...,n,$

$s_{y}(y)=f_{n}^{*}$

The next point x_n+1 is calculated as the value of y in the feasible region that minimizes $\sigma (s_{y})$ . It turns out that the function $y\mapsto \sigma (s_{y})$ is much cheaper to compute than the original function.

Here, the radial basis function interpolant s_n has the form

$s_{n}(x)=\sum \limits _{i=1}^{n}\lambda _{i}\phi \left(\left\|x-x_{i}\right\|_{2}\right)+b^{T}x+a,$

with $\lambda _{1},\ldots ,\lambda _{n}\in R$ , $b\in R^{d}$ , $a\in R$ and $\phi$ is either cubic with $\phi (r)=r^{3}$ or the thin plate spline $\phi (r)=r^{2}\log r$ . Gutmann considers other choices of $\phi$ and of the additional polynomial, but later concludes that the situation in the multiquadric and Gaussian cases is disappointing.

The unknown parameters $\lambda _{i}$ , b and a are obtained as the solution of the system of linear equations

$\left({\begin{array}{ll}\Phi &P\\P^{T}&0\end{array}}\right)\left({\begin{array}{c}\lambda \\c\end{array}}\right)=\left({\begin{array}{c}F\\0\end{array}}\right)$

where $\Phi$ is the n × n matrix with $\Phi _{ij}=\phi \left(\left\|x_{i}-x_{j}\right\|_{2}\right)$ and

$P=\left({\begin{array}{cc}x_{1}^{T}&1\\x_{2}^{T}&1\\.&.\\.&.\\x_{n}^{T}&1\end{array}}\right),\lambda =\left({\begin{array}{c}\lambda _{1}\\\lambda _{2}\\.\\.\\\lambda _{n}\end{array}}\right),c=\left({\begin{array}{c}b_{1}\\b_{2}\\.\\.\\b_{d}\\a\end{array}}\right),F=\left({\begin{array}{c}f(x_{1})\\f(x_{2})\\.\\.\\f(x_{n})\end{array}}\right).$

$s_{y}$ could be obtained accordingly, but there is no need to do that as one is only interested in $\sigma (s_{y})$ . Powell shows that if the rank of P is d + 1, then the matrix

$\left({\begin{array}{cc}\Phi &P\\P^{T}&0\end{array}}\right)$

is nonsingular and the linear system (4) has a unique solution.

For s_n it is

$\sigma (s_{y})=\sigma (s_{n})+\mu _{n}(y)\left[s_{n}(y)-f_{n}^{*}\right]^{2}$

$\sigma (s_{n})=\sum \limits _{i=1}^{n}\lambda _{i}s_{n}(x_{i}).$

Further, it is shown that $\sigma (s_{y})$ is

$y\notin \{x_{1},\ldots ,x_{n}\}$ .

Thus minimizing $\sigma (s_{y})$ subject to constraints is equivalent to minimizing g_n defined as

$g_{n}(y)=\mu _{n}(y)\left[s_{n}(y)-f_{n}^{*}\right]^{2},\ \ y\in \Omega \setminus \left\{x_{1},\ldots ,x_{n}\right\},$

where $\mu _{n}(y)$ is the coefficient corresponding to y of the Lagrangian function L that satisfies $L(x_{i})=0$ , $i=1,\ldots ,n$ and L(y) = 1. It can be computed as follows. F is extended to

$\Phi _{y}=\left({\begin{array}{cc}\Phi &\phi _{y}\\\phi _{y}^{T}&0\end{array}}\right),$

where $(\phi _{y})_{i}=\phi (\left\|y-x_{i}\right\|_{2})$ , $i=1,\ldots ,n$ and P is extended to

$P_{y}=\left({\begin{array}{c}P\\y^{T}\ \ 1\end{array}}\right).$

Then $\mu _{n}(y)$ is the (n + 1)-th component of $v\in R^{n+d+2}$ that solves the system

$\left({\begin{array}{cc}\Phi _{y}&P_{y}\\P_{y}^{T}&0\end{array}}\right)v=\left({\begin{array}{l}0_{n}\\1\\0_{d+1}\end{array}}\right).$

We use the notation 0_n and $0_{d+1}$ for column vectors with all entries equal to zero and with dimension n and (d + 1), respectively. The computation of $\mu _{n}(y)$ is done for many different y when minimizing $g_{n}(y)$ . This requires $O(n^{3})$ operations if not exploiting the structure of $\Phi _{y}$ and $P_{y}$ . Hence it does not make sense to solve the full system each time. A better alternative is to factorize the interpolation matrix and update the factorization for each y. An algorithm that requires $O(n^{2})$ operations is described in #Factorizations and Updates.

When there are large differences between function values, the interpolant has a tendency to oscillate strongly. It might also happen that $\min s_{n}(y)$ is much lower than the best known function value, which leads to a choice of $f_{n}^{*}$ that overemphasizes global search. To handle these problems, large function values are in each iteration replaced by the median of all computed function values. Note that $\mu _{n}$ and $g_{n}$ are not defined at $x_{1},\ldots ,x_{n}$ and

$\lim \limits _{y\rightarrow x_{i}}\mu _{n}(y)=\infty ,~~i=1,\ldots ,n.$

This will cause problems when $\mu _{n}$ is evaluated at a point close to one of the known points. The function $h_{n}(x)$ defined by

$h_{n}(x)~=~\left\{{\begin{array}{cl}{\frac {1}{g_{n}(x)}},&x\notin \left\{x_{1},\ldots ,x_{n}\right\}\\0,&x\in \left\{x_{1},\ldots ,x_{n}\right\}\end{array}}\right.$

is differentiable everywhere on $\Omega$ , and is thus a better choice as objective function. Instead of minimizing $g_{n}(y)$ one may minimize $-h_{n}(y)$ . In our implementation we instead minimize $-\log(h_{n}(y))$ . By this we avoid a flat minimum and numerical trouble when $h_{n}(y)$ is very small.

The Choice of f *

For the value of $\$f_{n}^{*}$ it should hold that

$f_{n}^{*}\in \left[-\infty ,\min \limits _{y\in \Omega }s_{n}(y)\right].$

The case $f_{n}^{*}=\min \limits _{y\in \Omega }s_{n}(y)$ is only admissible if $\min \limits _{y\in \Omega }s_{n}(y)<s_{n}(x_{i})$ , <mah>i=1, \ldots, n</math>. There are two special cases for the choice of $f_{n}^{*}$ . In the case when $f_{n}^{*}=\min \limits _{y\in \Omega }s_{n}(y)$ , then minimizing is equivalent to

$\min \limits _{y\in \Omega }s_{n}(y).$

In the case when $f_{n}^{*}=-\infty$ , then minimizing is equivalent to

$\min \limits _{y\in \Omega \setminus \left\{x_{1},\ldots ,x_{n}\right\}}\mu _{n}(y).$

So how should $f_{n}^{*}$ be chosen? If $f_{n}^{*}=-\infty$ , then the algorithm will choose the new point in an unexplored region, which is good from a global search point of view, but the objective function will not be exploited at all. If $f_{n}^{*}=\min \limits _{y\in \Omega }s_{n}(y)$ , the algorithm will show good local behaviour, but the global minimum might be missed. Therefore, there is a need for a mixture of values for $f_{n}^{*}$ close to and far away from $\min \limits _{y\in \Omega }s_{n}(y)$ . Gutmann describes two different strategies for the choice of $f_{n}^{*}$ .

The first strategy, denoted idea 1, is to perform a cycle of length N + 1 and choose $f_{n}^{*}$ as

$f_{n}^{*}=\min \limits _{y\in \Omega }s_{n}(y)-W\cdot \left(\max \limits _{i}f(x_{i})-\min \limits _{y\in \Omega }s_{n}(y)\right),$

with

$W=\left[{\frac {(N-(n-n_{init}))\mathrm {mod} (N+1)}{N}}\right]^{2},$

where $n_{init}$ is the number of initial points. Here, N = 5 is fixed and $\max \limits _{i}f(x_{i})$ is not taken over all points, except for the first step of the cycle. In each of the subsequent steps the $n-n_{max}$ points with largest function value are removed (not considered) when taking the maximum. Hence the quantity $\max \limits _{i}f(x_{i})$ is decreasing until the cycle is over. Then all points are considered again and the cycle starts from the beginning. More formally, if $(n-n_{init})\mathrm {mod} (N+1)=0$ , $n_{max}=n$ , otherwise

$n_{max}=\max \left\{2,n_{max}-\mathrm {floor} ((n-n_{init})/N)\right\}$

The second strategy, denoted idea 2, is to consider $f_{n}^{*}$ as the optimal value of

${\begin{array}{ll}\min &f^{*}(y)\\\mathrm {s.t.} &\mu _{n}(y)\left[s_{n}(y)-f^{*}(y)\right]^{2}\leq \alpha _{n}^{2}\\&y\in \Omega ,\end{array}}$

and then perform a cycle of length N + 1 on the choice of an . Here, N = 3 is fixed and

${\begin{array}{lll}\alpha _{n}&=&{\frac {1}{2}}\left(\max \limits _{i}f(x_{i})-\min \limits _{y\in \Omega }s_{n}(y)\right),\ \ n=n_{0},n_{0}+1\\\alpha _{n_{0}+2}&=&\min \left\{1,{\frac {1}{2}}\left(\max \limits _{i}f(x_{i})-\min \limits _{y\in \Omega }s_{n}(y)\right)\right\}\\\alpha _{n_{0}+3}&=&0,\end{array}}$

where $n_{0}$ is set to n at the beginning of each cycle. For this strategy, $\max \limits _{i}f(x_{i})$ is taken over all points in all parts of the cycle.

Note that for a fixed y the optimal $f^{*}(y)$ is the one for which

$\mu _{n}(y)\left[s_{n}(y)-f^{*}(y)\right]^{2}=\alpha _{n}^{2}.$

Substituting this equality constraint into the objective simplifies the problem to the minimization of

$f^{*}(y)=s_{n}(y)-\alpha _{n}/{\sqrt {\mu _{n}(y)}}.$

Denoting the minimizer by $y^{*}$ , and choosing $f_{n}^{*}=f^{*}(y^{*})$ , it is evident that $y^{*}$ minimizes $\mu _{n}(y)\left[s_{n}(y)-f_{n}^{*}\right]^{2}$ and hence $g_{n}(y)$ .

For both strategies (idea 1 and idea 2), a check is performed when $(n-n_{init})\mathrm {mod} (N+1)=N$ . This is the stage when a purely local search is performed, so it is important to make sure that the minimizer of sn is not one of the interpolation points or too close to one. The test used is

$f_{min}-\min \limits _{y\in \Omega }s_{n}(y)\leq 10^{-4}\max \left\{1,|f_{min}|\right\},$

where $f_{min}$ is the best function value found so far, i.e. $\min \limits _{i}f(x_{i})$ , $i=1,\ldots ,n$ . For the first strategy (idea 1), then

$f_{n}^{*}=\min \limits _{y\in \Omega }s_{n}(y)-10^{-2}\max \left\{1,|f_{min}|\right\},$

otherwise $f_{n}^{*}$ is set to 0. For the second strategy (idea 2), then an (or more correctly $\alpha _{n_{0}+3}$ ) is set

$\alpha _{n_{0}+3}=\min \left\{1,{\frac {1}{2}}\left(\max \limits _{i}f(x_{i})-\min \limits _{y\in \Omega }s_{n}(y)\right)\right\},$

otherwise $\alpha _{n_{0}+3}$ is set to 0.

Factorizations and Updates

In Powell, a factorization algorithm is presented for the solution. The algorithm makes use of the conditional definiteness of $\Phi$ , i.e. $\lambda ^{T}\Phi \lambda >0,~\lambda \neq 0$ and $P^{T}\lambda =0$ . If

$P=\left({\begin{array}{cc}Y&Z\end{array}}\right)\left({\begin{array}{c}R\\0\end{array}}\right)$

is the QR decomposition of P , then the columns of Z span the null space of $P^{T}$ , and every $\lambda$ with $P^{T}\lambda =0$ can be expressed as $\lambda =Zz$ for some vector z. Thus the conditional positive definiteness implies that

$z^{T}Z^{T}\Phi Zz>0,~z\in R^{n-d-1}~~\backslash ~\{0\}.$

This shows that $Z^{T}\Phi Z$ is positive definite, and thus its Cholesky factorization

$Z^{T}\Phi Z=LL^{T}$

exists. This property can be used to solve (4) as follows. Consider the interpolation condition $\Phi \lambda +Pc=F$ in (4). Multiply from left by $Z^{T}$ and replace $\lambda$ by $Zz$ . Because $Z^{T}P=0$ , the interpolation condition simplifies to

$Z^{T}\Phi Zz=Z^{T}F.$

Solving this system using the Cholesky factorization gives z. Then compute $\lambda =Zz$ and solve

$Pc=F-\Phi \lambda$

for c using the QR decomposition of P as

$Rc=Y^{T}\!(F-\Phi \lambda ).$

The same principle can be applied to solve (12) for a given y to get $\mu _{n}(y)$ . In analogy to the discussion above, if the extended matrices $\Phi _{y}$ and $P_{y}$ in (10) and (11), respectively, are given, and if

$Z_{y}^{T}P_{y}=0$

and

$Z_{y}^{T}\Phi _{y}Z_{y}=L_{y}L_{y}^{T}$

is the Cholesky factorization, then the vector

$v=Z_{y}z(y)$

yields $\mu _{n}(y)=v_{n+1}$ , where z(y) solves

$Z_{y}^{T}\Phi _{y}Z_{y}z=Z_{y}\left({\begin{array}{c}0_{n}\\1\end{array}}\right).$

The Cholesky factorization is the most expensive part of this procedure. It requires $(n^{3})$ operations. As $\mu _{n}(y)$ must be computed for many different y this is inacceptable. However, if one knows the QR factors of P and the Cholesky factor of Z T FZ , the QR factorization of P_y and the new Cholesky factor L_y can be computed in $O(n^{2})$ operations.

The new $\Phi (y)$ is

$\Phi _{y}=\left({\begin{array}{cc}\Phi &\phi _{y}\\\phi _{y}^{T}&0\end{array}}\right),$

where $\phi _{y})_{i}=\phi (\left\|y-x_{i}\right\|_{2})$ , $i=1,\ldots ,n$ . The new P (y) is

$P_{y}=\left({\begin{array}{c}P\\y^{T}\ \ 1\end{array}}\right).$

Compute the QR factorization of P_y, defined in (10). Given $P=QR$ , the QR factorization of $P_{y}$ may be written as

$P_{y}=Q_{y}R_{y}=\left({\begin{array}{cc}Q&0\\0&1\end{array}}\right)HR_{y},$

where $H$ is an orthogonal matrix obtained by d + 1 Givens rotations and for $i=d+2,\ldots ,n$ the i-th column of H is the i-th unit vector. Denote $B=Q^{T}\Phi Q$ . Using $\Phi _{y}$ as defined in (10) consider the expanded B matrix

${\begin{array}{lll}B_{y}&=&Q_{y}^{T}\Phi _{y}Q_{y}=H^{T}\left({\begin{array}{cc}Q^{T}&0\\0&1\end{array}}\right)\Phi _{y}\left({\begin{array}{cc}Q&0\\0&1\end{array}}\right)H=\\&=&H^{T}\left({\begin{array}{cc}B&Q^{T}\phi _{y}\\\phi _{y}^{T}Q&0\end{array}}\right)H.\end{array}}$

Multiplications from the right and left with H affects only the first (d + 1) rows and columns and the last row and the last columns of the matrix in the middle. (Remember, d is the dimension of the problem). Hence

$B_{y}=\left({\begin{array}{ccc}\ast &\ast &\ast \\\ast &Z^{T}\Phi Z&v\\\ast &v^{T}&\gamma \end{array}}\right),$

where * denotes entries not important for the moment. From the form of B_y it follows that

$Z_{y}^{T}\Phi Z_{y}=\left({\begin{array}{cc}Z^{T}\Phi Z&v\\v^{T}&\gamma \end{array}}\right)$

holds. The Cholesky factorization of $Z^{T}\Phi Z$ is already known. The new Cholesky $L_{y}$ factor is found by solving the lower triangular system $Ll=v$ for l, computing $\beta ={\sqrt {\gamma -l^{T}l}}$ , and setting

$L_{y}=\left({\begin{array}{cc}L&0\\l^{T}&\beta \end{array}}\right).$

It is easily seen that $L_{y}L_{y}^{T}=Z_{y}^{T}\Phi _{y}Z_{y}$ because

${\begin{array}{lll}L_{y}L_{y}^{T}&=&\left({\begin{array}{cc}L&0\\l^{T}&\beta \end{array}}\right)\left({\begin{array}{cc}L^{T}&l\\0&\beta \end{array}}\right)=\left({\begin{array}{cc}LL^{T}&Ll\\l^{T}L&l^{T}l+\beta ^{2}\end{array}}\right)=\\&=&\left({\begin{array}{cc}Z^{T}\Phi Z&v\\v^{T}&\gamma \end{array}}\right)=Z_{y}^{T}\Phi _{y}Z_{y}.\end{array}}$

Note that in practice we do the following: First compute the factorization of P , i.e. $P_{y}=Q_{y}R_{y}$ , using Givens rotations. Then, since we are only interested in v and $\gamma$ in (42), it is not necessary to compute the matrix B_y in (41). Setting v to the last column in Q_yand computing ${\tilde {v}}=\Phi _{y}^{T}{\hat {v}}=\Phi _{y}{\hat {v}}$ ( $\Phi _{y}$ is symmetric), gives v and $\gamma$ by multiplying the last (n - d) columns in Q_y by $tilde{v}$ , i.e.

$\left({\begin{array}{cc}v\\\gamma \end{array}}\right)={\begin{array}{ll}Q_{y_{\cdot i}}^{T}{\tilde {v}},&i=d+2,\ldots ,n+1.\end{array}}$

Using this algorithm, v and γ are computed using ((n + 1) + (n - d)) inner products instead of the two matrix multiplications in (41).

Note that the factorization algorithm is a normal 'null-space' method for solving an optimization problem involving linear equality constraints. The system of linear equations in (4) defines the necessary conditions for a stationary point to the unconstrained quadratic programming (QP) problem

$\min _{\lambda ,c}\quad {\frac {1}{2}}\lambda ^{T}\Phi \lambda +\lambda ^{T}(Pc-F).$

Viewing c as Lagrange multipliers for the linear equalities in (4), (47) is equivalent to the QP problem in λ defined as

$\min _{\lambda }\quad {\frac {1}{2}}\lambda ^{T}\Phi \lambda -F^{T}\lambda \quad {\text{subject to}}\quad P^{T}\lambda =0.$

The first condition in the conditional positive definiteness definition is the same as saying that the reduced Hessian must be positive definite at the solution of the QP problem if that solution is to be unique.

The type of update procedure described above is suitable each time an optimal point y = xn+1 is added. However, when evaluating all candidates y an even more efficient algorithm can be formulated. What is needed is a black-box procedure to solve linear systems with a general right-hand side:

$\left({\begin{array}{cc}\Phi &P\\P^{T}&0\end{array}}\right)\left({\begin{array}{c}\lambda \\c\end{array}}\right)=\left({\begin{array}{c}g\\r\end{array}}\right).$

Using the QR-factorization in (28) the steps

${\begin{array}{ccc}R^{T}v&=&r,\\Z^{T}\Phi Zw&=&Z^{T}(g-\Phi Yv),\\\lambda &=&Yv+Zw,\\Rc&=&Y^{T}(g-\Phi \lambda )\end{array}}$

simplify when r = 0 as in (4), but all steps are useful for solving the extended system (49); see next.

For each of many vectors y, the extended system takes the form

$\left({\begin{array}{cc|c}\Phi &\phi &P\\\phi ^{T}&0&p^{T}\\\hline P^{T}&p&0\end{array}}\right)\left({\begin{array}{c}{\bar {\lambda }}\\\mu \\{\bar {c}}\end{array}}\right)=\left({\begin{array}{c}0\\1\\0\end{array}}\right),$

where $p^{T}=(\,y^{T}\ \ 1\,)$ . This permutes to

$\left({\begin{array}{cc|c}\Phi &P&\phi \\P^{T}&0&p\\\hline \phi ^{T}&p^{T}&0\end{array}}\right)\left({\begin{array}{c}{\bar {\lambda }}\\{\bar {c}}\\\mu \end{array}}\right)=\left({\begin{array}{c}0\\0\\1\end{array}}\right),$

which may be solved by block-LU factorization (also known as the Schur-complement method). It helps that most of the right-hand side is zero. The solution is given by the steps

$\left({\begin{array}{cc}\Phi &P\\P^{T}&0\end{array}}\right)\left({\begin{array}{c}{\hat {\lambda }}\\{\hat {c}}\end{array}}\right)=\left({\begin{array}{c}\phi \\p\end{array}}\right),$

$\mu =-1/(\phi ^{T}{\hat {\lambda }}+p^{T}{\hat {c}}),$

$\left({\begin{array}{c}{\bar {\lambda }}\\{\bar {c}}\end{array}}\right)=-\mu \left({\begin{array}{c}{\hat {\lambda }}\\{\hat {c}}\end{array}}\right).$

Thus, each y requires little more than solving for $({\hat {\lambda }},{\hat {c}})$ using the current factorizations (two operations each with Q, R and L). This is cheaper than updating the factors for each y, and should be reliable unless the matrix in (4) is nearly singular. The updating procedure is best numerically, and it is still needed once when the final $y=x_{n+1}$ is chosen.

A Compact Algorithm Description

Section #Description of the Algorithm-#Factorizations and Updates described all the elements of the RBF algorithm as implemented in our Matlab routine rbfSolve, but our discussion has covered several pages. We now summarize everything in a compact step-by-step description. Steps 2, 6 and 7 are different in idea 1 and idea 2.

	idea 1	idea 2
1:	Choose n initial points x1 , . . . , xn (normally the 2d box corner points defined by the variable bounds). Compute Fi = f (xi ), i = 1, 2, . . . , n and set ninit = n.
2:	Start a cycle of length 6.	Start a cycle of length 4.
3:	If the maximum number of function evalua- tions reached, quit.
4:	Compute the radial basis function interpolant sn by solving the system of linear equations (4).
5:	Solve the minimization problem min sn (y). y??
6:	Compute f * in (18) corresponding to the current position in the cycle.	Compute an in (22) corresponding to the current position in the cycle.
7:	New point xn+1 is the value of y that mini- mizes gn (y) in (9).	New point x_n+1 is the value of y that min- imizes f *(y) in (24).
8:	Compute Fn+1 = f (xn+1 ) and set n = n + 1.
9:	If end of cycle, go to 2. Otherwise go to 4.

Some Implementation Details

The first question that arise is how to choose the points $x_{1},\ldots ,x_{n_{init}}$ to include in the initial set. We only consider box constrained problems, and choose the corners of the box as initial points, i.e. $n_{init}=2^{d}$ . Starting with other points is likely to lead to the corners during the iterations anyway. But as Gutmann suggests, having a "good" point beforehand, one can include it in the initial set.

The subproblem

${\begin{array}{ll}\min \limits _{y\in \Omega }&s_{n}(y)\end{array}},$

is itself a problem which could have more than one local minima. To solve (51) (at least approximately), we start from the interpolation point with the least function value, i.e. $mathrm{argmin}f(x_{i})$ , $i=1,\ldots ,n$ , and perform a local search. In many cases this leads to the minimum of s_n . Of course, there is no guarantee that it does. We use analytical expressions for the derivatives of sn and perform the local optimization using ucSolve in TOMLAB running the inverse BFGS algorithm.

To minimize $g_{n}(y)$ for the first strategy, or $f^{*}(y)$ for the second strategy, we use our Matlab routine glbSolve implementing the DIRECT algorithm (see the TOMLAB manual). We run glbSolve for 500 function evaluations and choose x_n+1 as the best point found by glbSolve. When $(n-n_{init})\mathrm {mod} (N+1)=N$ (when a purely local search is performed) and the minimizer of sn is not too close to any of the interpolation points, i.e. (25) is not true, glbSolve is not used to minimize $g_{n}(y)$ or $f^{*}(y)$ . Instead, we choose the minimizer of (51) as the new point x_n+1. The TOMLAB routine AppRowQR is used to update the QR decomposition.

Our experience so far with the RBF algorithm shows that for the second strategy (idea2), the minimum of (24) is very sensitive for the scaling of the box constraints. To overcome this problem we transform the search space to the unit hypercube. This algorithm improvement is necessary to avoid rank deficiency in the interpolation matrix for the train design problem.

In our implementation it is possible to restart the optimization with the final status of all parameters from the previous run.

CGO rbfSolve description

Contents

Summary

Introduction

The RBF Algorithm

Description of the Algorithm

The Choice of f *

Factorizations and Updates

A Compact Algorithm Description

Some Implementation Details

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools