All-at-once optimization for CP tensor decomposition
We explain how to use cp_opt_legacy function which implements the CP-OPT method that fits the CP model using direct or all-at-once optimization. This is in contrast to the cp_als function which implements the CP-ALS that fits the CP model using alternating optimization. The CP-OPT method is described in the following reference:
- E. Acar, D. M. Dunlavy and T. G. Kolda, A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions, J. Chemometrics, 25(2):67-86, 2011, http://doi.org/10.1002/cem.1335
Contents
- Newer version available
- Third-party optimization software
- Check that the software is installed.
- Create an example problem.
- Create initial guess using 'nvecs'
- Call the cp_opt_legacy method
- Check the output
- Evaluate the output
- Overfitting example
- Nonnegative factorization
- Create an example problem.
- Generate initial guess of the corret size
- Call the cp_opt_legacy method
- Check the output
- Evaluate the output
Newer version available
This documentation is for the legacy implementation of CP-OPT. We recommend using the newer version, described here.
Third-party optimization software
The cp_opt_legacy method uses third-party optimization software to do the optimization. You can use either
The remainder of these instructions assume L-BFGS-B is being used. See here for instructions on using cp_opt_legacy with Poblano.
Check that the software is installed.
Be sure that lbfgsb is in your path.
help lbfgsb
x = lbfgsb( fcn, l, u ) uses the lbfgsb v.3.0 library (fortran files must be installed; see compile_mex.m ) which is the L-BFGS-B algorithm. The algorithm is similar to the L-BFGS quasi-Newton algorithm, but also handles bound constraints via an active-set type iteration. This version is based on the modified C code L-BFGS-B-C, and so has a slightly different calling syntax than previous versions. The minimization problem that it solves is: min_x f(x) subject to l <= x <= u 'fcn' is a function handle that accepts an input, 'x', and returns two outputs, 'f' (function value), and 'g' (function gradient). 'l' and 'u' are column-vectors of constraints. Set their values to Inf if you want to ignore them. (You can set some values to Inf, but keep others enforced). The full format of the function is: [x,f,info] = lbfgsb( fcn, l, u, opts ) where the output 'f' has the value of the function f at the final iterate and 'info' is a structure with useful information (self-explanatory, except for info.err. The first column of info.err is the history of the function values f, and the second column is the history of norm( gradient, Inf ). ) The 'opts' structure allows you to pass further options. Possible field name values: opts.x0 The starting value (default: all zeros) opts.m Number of limited-memory vectors to use in the algorithm Try 3 <= m <= 20. (default: 5 ) opts.factr Tolerance setting (see this source code for more info) (default: 1e7 ). This is later multiplied by machine epsilon opts.pgtol Another tolerance setting, relating to norm(gradient,Inf) (default: 1e-5) opts.maxIts How many iterations to allow (default: 100) opts.maxTotalIts How many iterations to allow, including linesearch iterations (default: 5000) opts.printEvery How often to display information (default: 1) opts.errFcn A function handle (or cell array of several function handles) that computes whatever you want. The output will be printed to the screen every 'printEvery' iterations. (default: [] ) Results saved in columns 3 and higher of info.err variable Stephen Becker, srbecker@alumni.caltech.edu Feb 14, 2012 Updated Feb 21 2015, Stephen Becker, stephen.becker@colorado.edu
Create an example problem.
Create an example 50 x 40 x 30 tensor with rank 5 and add 10% noise.
rng('default'); %<- Setting random seed for reproducibility of this script R = 5; info = create_problem('Size', [50 40 30], 'Num_Factors', R, 'Noise', 0.10); X = info.Data; M_true = info.Soln;
Create initial guess using 'nvecs'
M_init = create_guess('Data', X, 'Num_Factors', R, 'Factor_Generator', 'nvecs');
Call the cp_opt_legacy method
Here is an example call to the cp_opt_legacy method. By default, each iteration prints the least squares fit function value (being minimized) and the norm of the gradient.
[M,M0,output] = cp_opt_legacy(X, R, 'init', M_init);
Iter 10, f(x) = 2.127981e+03, ||grad||_infty = 5.04e+02 Iter 20, f(x) = 5.558298e+02, ||grad||_infty = 6.07e-01 Iter 30, f(x) = 5.551920e+02, ||grad||_infty = 2.89e+00 Iter 40, f(x) = 5.549826e+02, ||grad||_infty = 2.52e+00 Iter 50, f(x) = 5.544325e+02, ||grad||_infty = 5.78e-01 Iter 60, f(x) = 5.542962e+02, ||grad||_infty = 1.06e+00 Iter 70, f(x) = 5.540865e+02, ||grad||_infty = 9.96e-01 Iter 80, f(x) = 5.538628e+02, ||grad||_infty = 4.11e-01 Iter 90, f(x) = 5.537953e+02, ||grad||_infty = 2.73e-01 Iter 100, f(x) = 5.537615e+02, ||grad||_infty = 1.10e+00 Iter 110, f(x) = 5.537386e+02, ||grad||_infty = 3.27e-01 Iter 120, f(x) = 5.537120e+02, ||grad||_infty = 1.91e+00 Iter 130, f(x) = 5.536752e+02, ||grad||_infty = 7.03e-01 Iter 140, f(x) = 5.536120e+02, ||grad||_infty = 2.03e+00 Iter 150, f(x) = 5.535694e+02, ||grad||_infty = 4.18e-01 Iter 160, f(x) = 5.535360e+02, ||grad||_infty = 2.62e-01 Iter 170, f(x) = 5.535234e+02, ||grad||_infty = 4.82e-01 Iter 180, f(x) = 5.535053e+02, ||grad||_infty = 4.26e-01 Iter 190, f(x) = 5.534586e+02, ||grad||_infty = 5.68e-01 Iter 200, f(x) = 5.534267e+02, ||grad||_infty = 1.16e-01 Iter 210, f(x) = 5.534137e+02, ||grad||_infty = 1.39e-01 Iter 220, f(x) = 5.534067e+02, ||grad||_infty = 1.91e-01 Iter 230, f(x) = 5.533932e+02, ||grad||_infty = 4.43e-01 Iter 240, f(x) = 5.533720e+02, ||grad||_infty = 3.60e-01 Iter 250, f(x) = 5.533548e+02, ||grad||_infty = 1.90e-01 Iter 260, f(x) = 5.533373e+02, ||grad||_infty = 2.83e-01 Iter 270, f(x) = 5.533247e+02, ||grad||_infty = 1.36e-01 Iter 280, f(x) = 5.533052e+02, ||grad||_infty = 3.17e-01 Iter 290, f(x) = 5.532915e+02, ||grad||_infty = 4.59e-01 Iter 300, f(x) = 5.532845e+02, ||grad||_infty = 7.43e-02 Iter 310, f(x) = 5.532803e+02, ||grad||_infty = 1.29e-01 Iter 320, f(x) = 5.532758e+02, ||grad||_infty = 6.66e-01 Iter 330, f(x) = 5.532708e+02, ||grad||_infty = 2.85e-01 Iter 340, f(x) = 5.532651e+02, ||grad||_infty = 1.63e-01 Iter 350, f(x) = 5.532621e+02, ||grad||_infty = 7.22e-02 Iter 360, f(x) = 5.532617e+02, ||grad||_infty = 5.31e-02 Iter 370, f(x) = 5.532610e+02, ||grad||_infty = 1.71e-02 Iter 380, f(x) = 5.532605e+02, ||grad||_infty = 6.43e-02 Iter 390, f(x) = 5.532604e+02, ||grad||_infty = 2.71e-02 Iter 400, f(x) = 5.532602e+02, ||grad||_infty = 2.94e-02 Iter 410, f(x) = 5.532601e+02, ||grad||_infty = 2.49e-02 Iter 420, f(x) = 5.532598e+02, ||grad||_infty = 2.22e-02 Iter 430, f(x) = 5.532597e+02, ||grad||_infty = 3.14e-02 Iter 433, f(x) = 5.532597e+02, ||grad||_infty = 1.11e-02
Check the output
It's important to check the output of the optimization method. In particular, it's worthwhile to check the exit message. The message CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH means that it has converged because the function value stopped improving.
exitmsg = output.ExitMsg
exitmsg = 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.'
The fit is the percentage of the data that is explained by the model. Because we have noise, we do not expect the fit to be perfect.
fit = output.Fit
fit = 99.0220
Evaluate the output
We can "score" the similarity of the model computed by CP and compare that with the truth. The score function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.
scr = score(M,M_true)
scr = 0.7998
Overfitting example
Re-using the same example as before, consider the case where we don't know R in advance. We might guess too high. Here we show a case where we guess R+1 factors rather than R.
% Generate initial guess of the corret size M_plus_init = create_guess('Data', X, 'Num_Factors', R+1, ... 'Factor_Generator', 'nvecs');
% Run the algorithm [M_plus,~,output] = cp_opt_legacy(X, R+1, 'init', M_plus_init); exitmsg = output.ExitMsg fit = output.Fit
Iter 10, f(x) = 2.126594e+03, ||grad||_infty = 5.05e+02 Iter 20, f(x) = 5.560240e+02, ||grad||_infty = 6.22e-01 Iter 30, f(x) = 5.546591e+02, ||grad||_infty = 3.27e+00 Iter 40, f(x) = 5.540186e+02, ||grad||_infty = 4.62e+00 Iter 50, f(x) = 5.529459e+02, ||grad||_infty = 1.28e+00 Iter 60, f(x) = 5.528452e+02, ||grad||_infty = 7.27e-01 Iter 70, f(x) = 5.521932e+02, ||grad||_infty = 6.45e-01 Iter 80, f(x) = 5.520235e+02, ||grad||_infty = 1.20e+00 Iter 90, f(x) = 5.519666e+02, ||grad||_infty = 2.69e-01 Iter 100, f(x) = 5.517083e+02, ||grad||_infty = 4.69e-01 Iter 110, f(x) = 5.516412e+02, ||grad||_infty = 4.07e-01 Iter 120, f(x) = 5.515534e+02, ||grad||_infty = 5.94e-01 Iter 130, f(x) = 5.514324e+02, ||grad||_infty = 8.34e-01 Iter 140, f(x) = 5.513545e+02, ||grad||_infty = 9.00e-01 Iter 150, f(x) = 5.513256e+02, ||grad||_infty = 7.89e-01 Iter 160, f(x) = 5.512999e+02, ||grad||_infty = 4.67e-01 Iter 170, f(x) = 5.512438e+02, ||grad||_infty = 6.21e-01 Iter 180, f(x) = 5.512105e+02, ||grad||_infty = 1.65e-01 Iter 190, f(x) = 5.511846e+02, ||grad||_infty = 3.21e-01 Iter 200, f(x) = 5.511424e+02, ||grad||_infty = 2.31e-01 Iter 210, f(x) = 5.511226e+02, ||grad||_infty = 3.45e-01 Iter 220, f(x) = 5.510866e+02, ||grad||_infty = 2.70e-01 Iter 230, f(x) = 5.510718e+02, ||grad||_infty = 2.43e-01 Iter 240, f(x) = 5.510516e+02, ||grad||_infty = 1.42e-01 Iter 250, f(x) = 5.510357e+02, ||grad||_infty = 3.43e-01 Iter 260, f(x) = 5.510208e+02, ||grad||_infty = 3.08e-01 Iter 270, f(x) = 5.509572e+02, ||grad||_infty = 3.87e-01 Iter 280, f(x) = 5.509329e+02, ||grad||_infty = 4.97e-01 Iter 290, f(x) = 5.509159e+02, ||grad||_infty = 2.11e-01 Iter 300, f(x) = 5.509049e+02, ||grad||_infty = 3.04e-01 Iter 310, f(x) = 5.508860e+02, ||grad||_infty = 1.57e-01 Iter 320, f(x) = 5.508805e+02, ||grad||_infty = 2.29e-01 Iter 330, f(x) = 5.508741e+02, ||grad||_infty = 3.44e-01 Iter 340, f(x) = 5.508599e+02, ||grad||_infty = 2.41e-01 Iter 350, f(x) = 5.508430e+02, ||grad||_infty = 2.11e-01 Iter 360, f(x) = 5.508228e+02, ||grad||_infty = 8.22e-01 Iter 370, f(x) = 5.508002e+02, ||grad||_infty = 8.91e-01 Iter 380, f(x) = 5.507405e+02, ||grad||_infty = 3.57e-01 Iter 390, f(x) = 5.507079e+02, ||grad||_infty = 2.94e-01 Iter 400, f(x) = 5.506815e+02, ||grad||_infty = 6.07e-01 Iter 410, f(x) = 5.506510e+02, ||grad||_infty = 2.21e-01 Iter 420, f(x) = 5.506304e+02, ||grad||_infty = 7.05e-01 Iter 430, f(x) = 5.506208e+02, ||grad||_infty = 2.88e-01 Iter 440, f(x) = 5.506106e+02, ||grad||_infty = 2.40e-01 Iter 450, f(x) = 5.506046e+02, ||grad||_infty = 1.84e-01 Iter 460, f(x) = 5.505972e+02, ||grad||_infty = 2.19e-01 Iter 470, f(x) = 5.505936e+02, ||grad||_infty = 2.57e-01 Iter 480, f(x) = 5.505779e+02, ||grad||_infty = 4.13e-01 Iter 490, f(x) = 5.505685e+02, ||grad||_infty = 2.86e-01 Iter 500, f(x) = 5.505542e+02, ||grad||_infty = 1.57e-01 Iter 510, f(x) = 5.505468e+02, ||grad||_infty = 2.88e-01 Iter 520, f(x) = 5.505433e+02, ||grad||_infty = 4.07e-02 Iter 530, f(x) = 5.505419e+02, ||grad||_infty = 1.50e-01 Iter 540, f(x) = 5.505398e+02, ||grad||_infty = 1.05e-01 Iter 550, f(x) = 5.505385e+02, ||grad||_infty = 6.65e-02 Iter 560, f(x) = 5.505370e+02, ||grad||_infty = 1.03e-01 Iter 570, f(x) = 5.505354e+02, ||grad||_infty = 4.84e-01 Iter 580, f(x) = 5.505325e+02, ||grad||_infty = 2.54e-01 Iter 590, f(x) = 5.505280e+02, ||grad||_infty = 8.43e-02 Iter 600, f(x) = 5.505260e+02, ||grad||_infty = 2.18e-01 Iter 610, f(x) = 5.505247e+02, ||grad||_infty = 2.90e-01 Iter 620, f(x) = 5.505232e+02, ||grad||_infty = 4.18e-02 Iter 630, f(x) = 5.505220e+02, ||grad||_infty = 1.61e-01 Iter 640, f(x) = 5.505188e+02, ||grad||_infty = 2.31e-01 Iter 650, f(x) = 5.505171e+02, ||grad||_infty = 1.25e-01 Iter 660, f(x) = 5.505131e+02, ||grad||_infty = 1.27e-01 Iter 670, f(x) = 5.505088e+02, ||grad||_infty = 3.52e-02 Iter 680, f(x) = 5.505053e+02, ||grad||_infty = 1.21e-01 Iter 690, f(x) = 5.505033e+02, ||grad||_infty = 1.42e-01 Iter 700, f(x) = 5.505020e+02, ||grad||_infty = 7.37e-02 Iter 710, f(x) = 5.505013e+02, ||grad||_infty = 1.10e-01 Iter 720, f(x) = 5.505007e+02, ||grad||_infty = 9.15e-02 Iter 730, f(x) = 5.505001e+02, ||grad||_infty = 6.26e-02 Iter 740, f(x) = 5.504991e+02, ||grad||_infty = 6.09e-02 Iter 750, f(x) = 5.504979e+02, ||grad||_infty = 9.48e-02 Iter 760, f(x) = 5.504951e+02, ||grad||_infty = 3.52e-02 Iter 770, f(x) = 5.504935e+02, ||grad||_infty = 2.88e-02 Iter 780, f(x) = 5.504921e+02, ||grad||_infty = 1.49e-02 Iter 790, f(x) = 5.504916e+02, ||grad||_infty = 2.79e-02 Iter 800, f(x) = 5.504911e+02, ||grad||_infty = 9.91e-02 Iter 810, f(x) = 5.504902e+02, ||grad||_infty = 2.42e-01 Iter 820, f(x) = 5.504894e+02, ||grad||_infty = 6.55e-02 Iter 830, f(x) = 5.504886e+02, ||grad||_infty = 3.77e-02 Iter 840, f(x) = 5.504881e+02, ||grad||_infty = 1.69e-02 Iter 850, f(x) = 5.504880e+02, ||grad||_infty = 1.85e-02 Iter 860, f(x) = 5.504879e+02, ||grad||_infty = 8.73e-03 Iter 870, f(x) = 5.504878e+02, ||grad||_infty = 7.87e-03 Iter 880, f(x) = 5.504877e+02, ||grad||_infty = 8.39e-03 Iter 890, f(x) = 5.504876e+02, ||grad||_infty = 6.39e-03 Iter 900, f(x) = 5.504875e+02, ||grad||_infty = 2.64e-02 Iter 906, f(x) = 5.504874e+02, ||grad||_infty = 7.51e-03 exitmsg = 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.' fit = 99.0269
% Check the answer (1 is perfect)
scr = score(M_plus, M_true)
scr = 0.8000
Nonnegative factorization
We can employ lower bounds to get a nonnegative factorization.
Create an example problem.
Create an example 50 x 40 x 30 tensor with rank 5 and add 10% noise. We select nonnegative factor matrices and lambdas. The create_problem doesn't really know how to add noise without going negative, so we hack it to make the observed tensor be nonzero.
R = 5; info = create_problem('Size', [50 40 30], 'Num_Factors', R, 'Noise', 0.10,... 'Factor_Generator', 'rand', 'Lambda_Generator', 'rand'); X = info.Data .* (info.Data > 0); % Force it to be nonnegative M_true = info.Soln;
Generate initial guess of the corret size
M_init = create_guess('Data', X, 'Num_Factors', R, ... 'Factor_Generator', 'rand');
Call the cp_opt_legacy method
Here we specify a lower bound of zero with the last two arguments.
[M,M0,output] = cp_opt_legacy(X, R, 'init', M_init,'lower',0);
Iter 10, f(x) = 7.931858e+01, ||grad||_infty = 1.19e+01 Iter 20, f(x) = 3.180133e+01, ||grad||_infty = 1.99e+00 Iter 30, f(x) = 2.570572e+01, ||grad||_infty = 2.00e+00 Iter 40, f(x) = 2.292782e+01, ||grad||_infty = 1.14e+00 Iter 50, f(x) = 2.217219e+01, ||grad||_infty = 1.04e+00 Iter 60, f(x) = 2.177179e+01, ||grad||_infty = 1.11e+00 Iter 70, f(x) = 2.154988e+01, ||grad||_infty = 7.88e-01 Iter 80, f(x) = 2.145103e+01, ||grad||_infty = 6.99e-01 Iter 90, f(x) = 2.141750e+01, ||grad||_infty = 5.78e-01 Iter 100, f(x) = 2.138105e+01, ||grad||_infty = 1.79e-01 Iter 110, f(x) = 2.136567e+01, ||grad||_infty = 1.85e-01 Iter 120, f(x) = 2.134807e+01, ||grad||_infty = 8.66e-02 Iter 130, f(x) = 2.133420e+01, ||grad||_infty = 5.54e-02 Iter 140, f(x) = 2.132942e+01, ||grad||_infty = 4.39e-02 Iter 150, f(x) = 2.132709e+01, ||grad||_infty = 6.41e-02 Iter 160, f(x) = 2.132540e+01, ||grad||_infty = 6.99e-02 Iter 170, f(x) = 2.132462e+01, ||grad||_infty = 7.59e-02 Iter 180, f(x) = 2.132414e+01, ||grad||_infty = 6.65e-02 Iter 190, f(x) = 2.132367e+01, ||grad||_infty = 6.93e-02 Iter 200, f(x) = 2.132355e+01, ||grad||_infty = 7.86e-02 Iter 210, f(x) = 2.132348e+01, ||grad||_infty = 8.06e-02 Iter 220, f(x) = 2.132344e+01, ||grad||_infty = 7.62e-02 Iter 230, f(x) = 2.132342e+01, ||grad||_infty = 7.22e-02 Iter 240, f(x) = 2.132342e+01, ||grad||_infty = 6.96e-02 Iter 250, f(x) = 2.132341e+01, ||grad||_infty = 6.94e-02 Iter 260, f(x) = 2.132341e+01, ||grad||_infty = 6.98e-02 Iter 270, f(x) = 2.132341e+01, ||grad||_infty = 6.91e-02 Iter 271, f(x) = 2.132341e+01, ||grad||_infty = 6.91e-02
Check the output
exitmsg = output.ExitMsg
exitmsg = 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH.'
The fit is the percentage of the data that is explained by the model. Because we have noise, we do not expect the fit to be perfect.
fit = output.Fit
fit = 99.0351
Evaluate the output
We can "score" the similarity of the model computed by CP and compare that with the truth. The score function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.
scr = score(M,M_true)
scr = 0.9752