Developer Information for Optimization Methods in Tensor Toolbox
This page contains information for developers who want to use the optimization method interfaces contained in Tensor Toolbox. These interfaces are meant to standardize across packages with very different ways of calling the functions, different parameter names, etc. For optimzation algorithm installation instructions and parameter options, see Documentation for Tensor Toolbox Optimization Methods.
The above-referenced document is for general users. The information below is for developers of Tensor Toolbox methods that want to use these standardized optimization algorithms.
Contents
Test optimization problem
We use a simple but notoriously difficult test function to demonstrate the optimization methods, a variation on Rosenbrock:

This function is implemented in optfunc which takes true as a second argument to return only the gradient. The number of parameters (n) is automatically determined by the method. The optimum is the all-ones vector. The code for the function is as follows:
function [f,g] = optfunc(x,gradonly) %OPTFUNC Example function for testing optimization software. n = length(x); g = zeros(n,1); f = .25*(x(1)-1.0)^2; for i = 2:n f = f + ( x(i) - (x(i-1))^2 )^2; end f = 4*f; t1 = x(2) - (x(1)^2); g(1) = 2.0*( x(1) - 1.0 ) - 16*x(1)*t1; for i = 2:(n-1) t2 = t1; t1 = x(i+1) - (x(i))^2; g(i) = 8*t2 - 16*x(i)*t1; end g(n) = 8*t1; if exist('gradonly','var') if gradonly == true f = g; end end
We create function handles to use the optimization solvers.
% Create a function handle to evaluate both f and g fgh = @optfunc; % Create a function handle to evaluate just f fh = @optfunc; % Create a function handle to evaluate just g gh = @(x) optfunc(x,true);
Set up initial guess and optimum
n = 10; xinit = [2; ones(n-1,1)]; xopt = ones(n,1); % Initial function value finit = fh(xinit); fprintf('Function value at xinit = %g\n', finit); % Optimal function value fopt = fh(xopt); fprintf('Function value at xopt = %g\n', fopt);
Function value at xinit = 37 Function value at xopt = 0
Sandia Poblano Toolbox L-BFGS
POBLANO Version 1.1 by Evrim Acar, Daniel Dunlavy, and Tamara Kolda
[x,f,info] = tt_opt_lbfgs(xinit, fgh);
Poblano L-BFGS Optimization
Number of Variables: 10
Parameters: m=5, ftol=1e-10, gtol/N=1e-06, maxiters = 1000, maxsubiters=10000
Begin Main Loop
Iter FuncEvals F(X) ||G(X)||/N
------ --------- ---------------- ----------------
0 1 37.00000000 10.08959860
1 8 0.91828104 0.92158834
2 10 0.16611190 0.32713416
3 12 0.03552274 0.15994623
4 14 0.00563181 0.05761539
5 16 0.00135701 0.02431580
6 18 0.00039065 0.01439439
7 20 0.00007552 0.00602386
8 22 0.00002250 0.00320035
9 24 0.00000020 0.00041099
10 26 0.00000004 0.00015232
11 28 0.00000001 0.00004888
12 30 0.00000000 0.00001906
13 32 0.00000000 0.00000505
14 34 0.00000000 0.00000235
15 36 0.00000000 0.00000057
End Main Loop
Final f: 2.4551e-12
Setup time: 0.00018 seconds
Optimization time: 0.0024 seconds
Iterations: 15
Total iterations: 36
Exit condition: Successful termination based on StopTol
Check how close to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.255961e-04
Tighten it up a bit by changing the tolerances. ALso note that we changed the printing frequency to once every 5 iterations rather than every iteration as above.
[x,f,info] = tt_opt_lbfgs(xinit, fgh, 'printitn', 5, 'gtol', 1e-20, 'ftol', 1e-25);
Poblano L-BFGS Optimization
Number of Variables: 10
Parameters: m=5, ftol=1e-25, gtol/N=1e-21, maxiters = 1000, maxsubiters=10000
Begin Main Loop
Iter FuncEvals F(X) ||G(X)||/N
------ --------- ---------------- ----------------
0 1 37.00000000 10.08959860
5 16 0.00135701 0.02431580
10 26 0.00000004 0.00015232
15 36 0.00000000 0.00000057
20 46 0.00000000 0.00000000
25 57 0.00000000 0.00000000
30 72 0.00000000 0.00000036
35 82 0.00000000 0.00000001
40 92 0.00000000 0.00000000
45 102 0.00000000 0.00000000
50 112 0.00000000 0.00000000
55 123 0.00000000 0.00000000
59 131 0.00000000 0.00000000
End Main Loop
Final f: 3.7133e-27
Setup time: 0.00016 seconds
Optimization time: 0.0064 seconds
Iterations: 59
Total iterations: 131
Exit condition: Relative change in F < RelFuncTol
With tighter tolerances, we get closer to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 6.296881e-12
Stephen Becker L-BFGS-B
[x,f,info] = tt_opt_lbfgsb(xinit, fgh);
L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Number of Variables: 10 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 1, f(x) = 1.263359e+00, ||grad||_infty = 1.20e+01 Iter 2, f(x) = 2.814196e-01, ||grad||_infty = 4.85e+00 Iter 3, f(x) = 6.099619e-02, ||grad||_infty = 1.11e+00 Iter 4, f(x) = 2.574744e-02, ||grad||_infty = 7.85e-01 Iter 5, f(x) = 5.050682e-03, ||grad||_infty = 4.27e-01 Iter 6, f(x) = 1.585758e-03, ||grad||_infty = 2.84e-01 Iter 7, f(x) = 1.701942e-04, ||grad||_infty = 6.15e-02 Iter 8, f(x) = 5.353674e-05, ||grad||_infty = 1.69e-02 Iter 9, f(x) = 2.318400e-05, ||grad||_infty = 1.39e-02 Iter 10, f(x) = 2.569928e-06, ||grad||_infty = 9.27e-03 Iter 11, f(x) = 3.586768e-07, ||grad||_infty = 3.46e-03 Iter 12, f(x) = 2.598817e-08, ||grad||_infty = 7.97e-04 Iter 13, f(x) = 2.466644e-09, ||grad||_infty = 2.50e-04 Iter 14, f(x) = 5.126655e-10, ||grad||_infty = 7.74e-05 Iter 15, f(x) = 8.969330e-11, ||grad||_infty = 7.19e-05 Iter 16, f(x) = 9.642045e-12, ||grad||_infty = 1.23e-05 End Main Loop Final f: 9.6420e-12 Setup time: 0.0002 seconds Optimization time: 0.00087 seconds Iterations: 16 Total iterations: 35 Exit condition: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.
Check how close to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.224764e-04
Tighten it up a bit as before
[x,f,info] = tt_opt_lbfgsb(xinit, fgh, 'printitn', 5, 'gtol', 1e-12, 'ftol', 1e-25);
L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Number of Variables: 10 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-25, gtol=1e-12, maxiters = 1000, subiters = 10 Begin Main Loop Iter 5, f(x) = 5.050682e-03, ||grad||_infty = 4.27e-01 Iter 10, f(x) = 2.569928e-06, ||grad||_infty = 9.27e-03 Iter 15, f(x) = 8.969330e-11, ||grad||_infty = 7.19e-05 Iter 20, f(x) = 1.343287e-12, ||grad||_infty = 9.30e-07 Iter 25, f(x) = 1.306474e-12, ||grad||_infty = 3.38e-09 Iter 30, f(x) = 1.306458e-12, ||grad||_infty = 1.11e-08 Iter 35, f(x) = 1.304805e-12, ||grad||_infty = 1.56e-07 Iter 40, f(x) = 1.037402e-12, ||grad||_infty = 1.33e-06 Iter 45, f(x) = 6.552398e-16, ||grad||_infty = 1.06e-07 Iter 50, f(x) = 1.720279e-19, ||grad||_infty = 2.17e-09 Iter 55, f(x) = 2.298904e-21, ||grad||_infty = 1.61e-11 Iter 60, f(x) = 1.714203e-21, ||grad||_infty = 7.02e-11 Iter 65, f(x) = 7.255157e-24, ||grad||_infty = 1.64e-11 Iter 67, f(x) = 2.032944e-26, ||grad||_infty = 4.58e-13 End Main Loop Final f: 2.0329e-26 Setup time: 0.00022 seconds Optimization time: 0.0018 seconds Iterations: 67 Total iterations: 140 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.
With tigher tolerances, we again get closer to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 3.479939e-14
Johannes Brust limited-memory compact representation
CR Compact quasi-Newton by Johannes Brust with a line-search from Philip Gill.
[x,f,info] = tt_opt_compact(xinit, fgh);
CR solver (limited-memory) (https://github.com/johannesbrust/CR)
Number of Variables: 10
Parameters: m=5, gtol=1e-05, maxiters = 1000 subiters = 500
Begin Main Loop
Compact Representation (ALG0)
Line-search: wolfeG
V-strategy: s
n= 10
--------------------------------
k nf fk ||gk|| step iexit
0 1 3.70000e+01 1.00896e+02 1.000e+00 0
1 2 1.26336e+00 1.30409e+01 1.000e+00 1
2 3 2.81420e-01 4.97849e+00 1.000e+00 1
3 4 6.09962e-02 1.48008e+00 1.000e+00 1
4 5 2.57474e-02 9.61089e-01 1.000e+00 1
5 6 5.05068e-03 5.30090e-01 1.000e+00 1
6 7 1.58576e-03 4.16083e-01 1.000e+00 1
7 8 1.70194e-04 1.05632e-01 1.000e+00 1
8 9 5.35367e-05 3.69046e-02 1.000e+00 1
9 10 2.31840e-05 2.43251e-02 1.000e+00 1
10 11 2.56993e-06 1.74034e-02 1.000e+00 1
11 13 3.58677e-07 5.70151e-03 3.762e-01 1
12 14 2.59882e-08 1.33094e-03 1.000e+00 1
13 15 2.46662e-09 3.41827e-04 1.000e+00 1
14 16 5.12661e-10 1.43137e-04 1.000e+00 1
k nf fk ||gk|| step iexit
15 17 8.96881e-11 7.80651e-05 1.000e+00 1
16 18 9.64264e-12 1.84810e-05 1.000e+00 1
17 19 3.03762e-12 7.92967e-06 1.000e+00 1
End Main Loop
Final f: 3.0376e-12
Setup time: 0.00016 seconds
Optimization time: 0.001 seconds
Iterations: 17
Total iterations: 19
Exit condition: Convergence: Norm tolerance
Check how close to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.224762e-04
Tighten it up a bit as before
[x,f,info] = tt_opt_compact(xinit, fgh, 'printitn', 5, 'gtol', 1e-12);
CR solver (limited-memory) (https://github.com/johannesbrust/CR)
Number of Variables: 10
Parameters: m=5, gtol=1e-12, maxiters = 1000 subiters = 500
Begin Main Loop
Compact Representation (ALG0)
Line-search: wolfeG
V-strategy: s
n= 10
--------------------------------
k nf fk ||gk|| step iexit
0 1 3.70000e+01 1.00896e+02 1.000e+00 0
5 6 5.05068e-03 5.30090e-01 1.000e+00 1
10 11 2.56993e-06 1.74034e-02 1.000e+00 1
15 17 8.96881e-11 7.80651e-05 1.000e+00 1
20 22 1.34326e-12 1.23048e-06 1.000e+00 1
25 28 1.30647e-12 5.66362e-09 1.000e+00 1
30 33 1.30646e-12 2.17995e-08 1.000e+00 1
35 38 1.30515e-12 2.10606e-07 1.000e+00 1
40 43 9.77235e-13 5.01499e-06 1.000e+00 1
45 49 8.34162e-16 2.11226e-07 1.000e+00 1
50 54 9.80967e-20 1.31896e-09 1.000e+00 1
55 60 5.26889e-22 1.03033e-10 1.000e+00 1
60 65 3.25913e-24 8.43911e-12 1.166e+00 1
65 71 9.76894e-26 1.81727e-12 1.000e+00 1
End Main Loop
Final f: 2.0997e-26
Setup time: 0.00016 seconds
Optimization time: 0.0027 seconds
Iterations: 68
Total iterations: 79
Exit condition: Convergence: Norm tolerance
With tigher tolerances, we again get closer to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 1.466027e-11
MATLAB Optimization Toolbox fminunc (Quasi-Newton Method)
Distributed with the MATLAB Optimization Toolbox.
[x,f,info] = tt_opt_fminunc(xinit, fgh);
Unconstrained Optimization (via Optimization Toolbox)
Number of Variables: 10
Parameters: gtol=1e-05, maxiters = 1000, maxsubiters=10000
Begin Main Loop
First-order
Iteration Func-count f(x) Step-size optimality
0 1 37 98
1 2 1.44889 0.0102041 12.9
2 3 0.288722 1 5.05
3 4 0.0604657 1 1.17
4 5 0.032238 1 0.793
5 6 0.0130798 1 0.564
6 7 0.00500013 1 0.368
7 8 0.00196856 1 0.256
8 9 0.000983737 1 0.126
9 10 0.000489883 1 0.0917
10 11 0.000215431 1 0.0573
11 12 8.95808e-05 1 0.0457
12 13 4.14183e-05 1 0.0258
13 14 2.05777e-05 1 0.0191
14 15 9.29022e-06 1 0.0104
15 16 3.31583e-06 1 0.00825
16 17 9.13586e-07 1 0.0046
17 18 2.13163e-07 1 0.00199
18 19 4.55807e-08 1 0.00121
19 20 8.72743e-09 1 0.000601
Local minimum found.
Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.
End Main Loop
Final f: 8.7274e-09
Setup time: 0.00032 seconds
Optimization time: 0.0012 seconds
Iterations: 19
Total iterations: 20
Exit condition: Gradient norm smaller than tolerance
Check how close to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.046793e-04
Tighten it up a bit as before. We don't have control over the print frequency with this one, nor the change in function tolerance. But we can pass through the `StepTolerance`.
[x,f,info] = tt_opt_fminunc(xinit, fgh, 'gtol', 1e-12, 'StepTolerance',1e-14);
Unconstrained Optimization (via Optimization Toolbox)
Number of Variables: 10
Parameters: gtol=1e-12, maxiters = 1000, maxsubiters=10000
Begin Main Loop
First-order
Iteration Func-count f(x) Step-size optimality
0 1 37 98
1 2 1.44889 0.0102041 12.9
2 3 0.288722 1 5.05
3 4 0.0604657 1 1.17
4 5 0.032238 1 0.793
5 6 0.0130798 1 0.564
6 7 0.00500013 1 0.368
7 8 0.00196856 1 0.256
8 9 0.000983737 1 0.126
9 10 0.000489883 1 0.0917
10 11 0.000215431 1 0.0573
11 12 8.95808e-05 1 0.0457
12 13 4.14183e-05 1 0.0258
13 14 2.05777e-05 1 0.0191
14 15 9.29022e-06 1 0.0104
15 16 3.31583e-06 1 0.00825
16 17 9.13586e-07 1 0.0046
17 18 2.13163e-07 1 0.00199
18 19 4.55807e-08 1 0.00121
19 20 8.72743e-09 1 0.000601
First-order
Iteration Func-count f(x) Step-size optimality
20 21 1.56598e-09 1 0.000224
21 22 2.81917e-10 1 9.2e-05
22 23 4.32264e-11 1 3.11e-05
23 24 5.20858e-12 1 1.08e-05
24 25 1.36374e-12 1 2.87e-06
25 26 1.11805e-12 1 7.95e-07
26 27 1.10532e-12 1 1.7e-07
27 28 1.10468e-12 1 3.72e-08
28 29 1.10465e-12 1 5.02e-09
29 30 1.10465e-12 1 2.99e-09
30 31 1.10465e-12 1 2.9e-09
31 32 1.10465e-12 1 2.7e-09
32 33 1.10465e-12 1 4.78e-09
33 34 1.10464e-12 1 9.07e-09
34 35 1.10464e-12 1 1.58e-08
35 36 1.10462e-12 1 2.7e-08
36 37 1.10457e-12 1 4.52e-08
37 38 1.10443e-12 1 7.47e-08
38 39 1.10408e-12 1 1.22e-07
39 40 1.10315e-12 1 2e-07
First-order
Iteration Func-count f(x) Step-size optimality
40 41 1.10074e-12 1 3.24e-07
41 42 1.09444e-12 1 5.24e-07
42 43 1.07818e-12 1 8.4e-07
43 44 1.03705e-12 1 1.32e-06
44 45 9.38556e-13 1 1.99e-06
45 46 7.32415e-13 1 2.68e-06
46 47 4.12063e-13 1 2.8e-06
47 48 1.21521e-13 1 1.8e-06
48 49 1.11669e-14 1 4.9e-07
49 50 5.69644e-16 1 6.41e-08
50 51 1.47514e-17 1 1.64e-08
51 52 7.76625e-20 1 1.58e-09
52 53 1.61415e-21 1 2.67e-10
53 54 8.82595e-23 1 5.87e-11
Local minimum found.
Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.
End Main Loop
Final f: 8.8259e-23
Setup time: 0.00038 seconds
Optimization time: 0.0017 seconds
Iterations: 53
Total iterations: 54
Exit condition: Gradient norm smaller than tolerance
With tigher tolerances, we again get closer to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 2.531170e-10
Adam (internal to Tensor Toolbox)
Adam is a stochastic optimization method, so it's not especially fair to compare it to the other methods. But we do it anyways.
rng('default') [x,f,info] = tt_opt_adam(xinit, fh, gh,'subiters',500,'maxfails',2,'maxiters',20000,'printitn',500,'fdesc','Function: exact','gdesc','Gradient: exact');
Adam Stochastic Optimization Number of Variables: 10 Function: exact Gradient: exact Max iterations (epochs): 20000 Iterations per epoch: 500 Learning rate / decay / maxfails: 0.01 0.1 2 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 3.700000e+01 Epoch 3: f = 4.895148e-06, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 14: f = 4.624758e-06, step = 0.001, nfails = 2 (resetting to solution from last epoch) Epoch 32: f = 1.787285e-06, step = 0.0001, nfails = 3 (resetting to solution from last epoch) End Main Loop Final f: 1.7803e-06 Setup time: 0.00 seconds Optimization time: 0.02 seconds Number of epochs: 32 Total iterations: 16000 Exit condition: Number of failed epochs > 2
Check how close to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 3.608177e-01
Let's tighten the convergence tolerance.
rng(1) [x,f,info] = tt_opt_adam(xinit, fh, gh,'subiters',500,'maxfails',6,'maxiters',20000,'printitn',500,'fdesc','Function: exact','gdesc','Gradient: exact');
Adam Stochastic Optimization Number of Variables: 10 Function: exact Gradient: exact Max iterations (epochs): 20000 Iterations per epoch: 500 Learning rate / decay / maxfails: 0.01 0.1 6 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 3.700000e+01 Epoch 3: f = 4.895148e-06, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 14: f = 4.624758e-06, step = 0.001, nfails = 2 (resetting to solution from last epoch) Epoch 32: f = 1.787285e-06, step = 0.0001, nfails = 3 (resetting to solution from last epoch) Epoch 500: f = 8.408196e-07, step = 1e-05 Epoch 1000: f = 2.292266e-07, step = 1e-05 Epoch 1011: f = 2.220675e-07, step = 1e-05, nfails = 4 (resetting to solution from last epoch) Epoch 1500: f = 4.355436e-08, step = 1e-06 Epoch 2000: f = 4.495148e-09, step = 1e-06 Epoch 2234: f = 1.418108e-09, step = 1e-06, nfails = 5 (resetting to solution from last epoch) Epoch 2500: f = 4.538938e-10, step = 1e-07 Epoch 3000: f = 3.828861e-11, step = 1e-07 Epoch 3188: f = 1.485887e-11, step = 1e-07, nfails = 6 (resetting to solution from last epoch) Epoch 3500: f = 6.286379e-12, step = 1e-08 Epoch 4000: f = 1.176231e-12, step = 1e-08 Epoch 4500: f = 1.811833e-13, step = 1e-08 Epoch 4838: f = 4.966636e-14, step = 1e-08, nfails = 7 (resetting to solution from last epoch) End Main Loop Final f: 4.9634e-14 Setup time: 0.00 seconds Optimization time: 3.50 seconds Number of epochs: 4838 Total iterations: 2419000 Exit condition: Number of failed epochs > 6
Check how close to optimal
fprintf('||x-xopt||/||xopt|| = %e\n', norm(x-xopt)/norm(xopt));
||x-xopt||/||xopt|| = 4.335430e-05
