Implicit Symmetric CP Decomposition for Symmetric K-Tensors
The function cp_isym computes the symmetric CP decomposition of a symmetric tensor that is in symmetric k-tensor format, meaning that it is the sum of symmetric outer products and stored as a symktensor object. The decomposition is described in the following reference:
- S. Sherman, T. G. Kolda, Estimating Higher-Order Moments Using Symmetric Tensor Decomposition, SIAM J. Matrix Analysis and Applications, 41:1369-1387, 2020, https://doi.org/10.1137/19m1299633
Contents
- Requirements
- Create a sample problem based on a Gaussian mixture model.
- Recommend multiple runs of CP-ISYM
- Options for CP-ISYM with L-BFGS-B (default) optimization solver
- Options for CP-ISYM with L-BFGS optimization solver from Poblano
- Options for CP-ISYM with Quasi-Newton optimization solver from Optimization Toolbox
- Options for CP-ISYM with Adam stochastic optimization solver
Requirements
This code requires an optimization solver. Our examples use the L-BFGS-B optimization method; see Optimization Options for details of installation and usage as well as other options for optimization solvers.
Create a sample problem based on a Gaussian mixture model.
Here we used the third-order moment (d=3), for a problem with random variables of dimension n=100. We take a mixture of r=5 Gaussians and collect p=750 observations.
d = 3; n = 100; r = 5; p = 250*r;
We construct the means so that they are collinear, i.e., the cosine of the angle between any two vectors is 0.5. This is a more difficult problem than purely random vectors which are close to orthogonal.
rng('default');
collinearity = 0.5;
Atrue = matrandcong(n,r,collinearity);
Each Gaussian gets equal weight in the mixture.
wghts = 1/r * ones(r,1);
The `true` solution is based on the means and their prevalence in the mixture.
Mtrue = symktensor(wghts,Atrue,d); Mtrue = normalize(Mtrue);
The data tensor is based on noisy observations from the mixture model. The idx determines which Gaussian is sampled. And the samples have noise as specified by stddev. The result is assembled into an observation tensor that in symktensor format.
idx = randsample(r,p,true,wghts); stddev = 1e-3; noise = stddev * randn(n,p); V = Atrue(:,idx) + noise; X = symktensor(1/p*ones(p,1),V,d); X = normalize(X,0);
Recommend multiple runs of CP-ISYM
Since this is a non-convex optimization problem, it's critically important to do multiple runs of the optimzation problem. In the code below, we run 10 times. Not every run will converge to the global minimum. We take solution with the lowest function value, which should ideally yield the best match to the true solution. For our artificial problem, we can actually verify the score of how well the computed solution matches with the ideal solution.
rng(7) nruns = 10; M = cell(nruns,1); info = cell(nruns,1); for i = 1:nruns fprintf('------------- Run %d of %d --------------\n', i, nruns); [M{i},info{i}] = cp_isym(X,r,'Xnormsqr','exact','printitn',0); fprintf('Final F: %.2e, Score: %.2f\n', info{i}.f, score(M{i},Mtrue)); if i == 1 ifbest = i; fbest = info{i}.f; sfbest = score(M{i},Mtrue); elseif info{i}.f < fbest ifbest = i; fbest = info{i}.f; sfbest = score(M{i},Mtrue); end end
------------- Run 1 of 10 -------------- Final F: 1.94e-02, Score: 0.38 ------------- Run 2 of 10 -------------- Final F: 1.94e-02, Score: 0.41 ------------- Run 3 of 10 -------------- Final F: 1.20e-09, Score: 0.95 ------------- Run 4 of 10 -------------- Final F: 7.76e-09, Score: 0.95 ------------- Run 5 of 10 -------------- Final F: 1.94e-02, Score: 0.41 ------------- Run 6 of 10 -------------- Final F: 5.01e-09, Score: 0.95 ------------- Run 7 of 10 -------------- Final F: 3.65e-09, Score: 0.95 ------------- Run 8 of 10 -------------- Final F: 7.29e-09, Score: 0.95 ------------- Run 9 of 10 -------------- Final F: 1.94e-02, Score: 0.41 ------------- Run 10 of 10 -------------- Final F: 6.14e-02, Score: 0.00
fprintf('------------- Best of %d runs --------------\n', nruns); fprintf('Run %d -> Final F: %.2e, Score: %.2f\n', ifbest, fbest, sfbest);
------------- Best of 10 runs -------------- Run 3 -> Final F: 1.20e-09, Score: 0.95
Options for CP-ISYM with L-BFGS-B (default) optimization solver
The CP solution should produce something close to Mtrue, but it will not be exact because of the randomness in building the observation tensor. Here we show what happens when the true solution is provided as an initial guess. The function should converge to zero, and the similarlity score should be close to one. The method defaults to 'lfbgsb'.
[M,info] = cp_isym(X,r,'Xnormsqr','exact','init',Mtrue); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 1, f(x) = 1.798804e-04, ||grad||_infty = 1.54e-02, 4.50e-03 Iter 2, f(x) = 2.009885e-06, ||grad||_infty = 1.15e-03, 5.64e-03 Iter 3, f(x) = 1.320009e-06, ||grad||_infty = 7.90e-04, 6.92e-03 Iter 4, f(x) = 9.160013e-07, ||grad||_infty = 2.02e-04, 8.24e-03 Iter 5, f(x) = 6.929470e-07, ||grad||_infty = 1.16e-04, 9.59e-03 Iter 6, f(x) = 2.975243e-07, ||grad||_infty = 2.70e-04, 1.09e-02 Iter 7, f(x) = 1.647469e-07, ||grad||_infty = 1.26e-04, 1.23e-02 Iter 8, f(x) = 1.401774e-07, ||grad||_infty = 6.75e-05, 1.37e-02 Iter 9, f(x) = 1.275847e-07, ||grad||_infty = 7.95e-05, 1.50e-02 Iter 10, f(x) = 7.883570e-08, ||grad||_infty = 1.39e-04, 1.70e-02 Iter 11, f(x) = 5.170653e-08, ||grad||_infty = 3.19e-05, 1.83e-02 Iter 12, f(x) = 3.582501e-08, ||grad||_infty = 1.63e-05, 1.95e-02 Iter 13, f(x) = 1.921946e-08, ||grad||_infty = 9.09e-05, 2.10e-02 Iter 14, f(x) = 6.336888e-09, ||grad||_infty = 4.28e-05, 2.22e-02 Iter 15, f(x) = 3.691645e-09, ||grad||_infty = 3.04e-05, 2.37e-02 Iter 16, f(x) = 2.252858e-09, ||grad||_infty = 9.17e-06, 2.51e-02 End Main Loop Final f: 2.2529e-09 Setup time: 0.0024 seconds Optimization time: 0.026 seconds Iterations: 16 Total iterations: 35 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F: 2.25e-09, Score: 0.95 ***
It is also possible to specify the initial guess as factor matrix
[M,info] = cp_isym(X,r,'Xnormsqr','exact','init',Mtrue); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 1, f(x) = 1.798804e-04, ||grad||_infty = 1.54e-02, 4.42e-03 Iter 2, f(x) = 2.009885e-06, ||grad||_infty = 1.15e-03, 5.64e-03 Iter 3, f(x) = 1.320009e-06, ||grad||_infty = 7.90e-04, 7.02e-03 Iter 4, f(x) = 9.160013e-07, ||grad||_infty = 2.02e-04, 8.23e-03 Iter 5, f(x) = 6.929470e-07, ||grad||_infty = 1.16e-04, 1.00e-02 Iter 6, f(x) = 2.975243e-07, ||grad||_infty = 2.70e-04, 1.14e-02 Iter 7, f(x) = 1.647469e-07, ||grad||_infty = 1.26e-04, 1.26e-02 Iter 8, f(x) = 1.401774e-07, ||grad||_infty = 6.75e-05, 1.39e-02 Iter 9, f(x) = 1.275847e-07, ||grad||_infty = 7.95e-05, 1.53e-02 Iter 10, f(x) = 7.883570e-08, ||grad||_infty = 1.39e-04, 1.67e-02 Iter 11, f(x) = 5.170653e-08, ||grad||_infty = 3.19e-05, 1.80e-02 Iter 12, f(x) = 3.582501e-08, ||grad||_infty = 1.63e-05, 1.93e-02 Iter 13, f(x) = 1.921946e-08, ||grad||_infty = 9.09e-05, 2.06e-02 Iter 14, f(x) = 6.336888e-09, ||grad||_infty = 4.28e-05, 2.19e-02 Iter 15, f(x) = 3.691645e-09, ||grad||_infty = 3.04e-05, 2.31e-02 Iter 16, f(x) = 2.252858e-09, ||grad||_infty = 9.17e-06, 2.46e-02 End Main Loop Final f: 2.2529e-09 Setup time: 0.0016 seconds Optimization time: 0.025 seconds Iterations: 16 Total iterations: 35 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F: 2.25e-09, Score: 0.95 ***
So far we've been using the exact objective function which is close to zero for the optimal zero. We can, however, ignore the constant |X|^2 term in objective function, which saves some preprocessing cost. The main disadvantage of ignoring the constant term is that the final function value will no longer be near zero.
[M,info] = cp_isym(X,r,'Xnormsqr',0,'init',Atrue,'printitn',1); fprintf('\n\t\t*** Final F (adjusted): %.2e, Score: %.2f ***\n\n', ... info.f + norm(X)^2, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 1, f(x) = -2.108869e-01, ||grad||_infty = 5.67e-02, 3.00e-03 Iter 2, f(x) = -2.551309e-01, ||grad||_infty = 4.68e-02, 5.40e-03 Iter 3, f(x) = -2.954389e-01, ||grad||_infty = 1.44e-02, 7.86e-03 Iter 4, f(x) = -2.983326e-01, ||grad||_infty = 1.31e-02, 8.97e-03 Iter 5, f(x) = -2.998836e-01, ||grad||_infty = 5.63e-03, 1.03e-02 Iter 6, f(x) = -3.003519e-01, ||grad||_infty = 5.88e-04, 1.20e-02 Iter 7, f(x) = -3.003573e-01, ||grad||_infty = 7.43e-05, 1.36e-02 Iter 8, f(x) = -3.003574e-01, ||grad||_infty = 5.83e-05, 1.49e-02 Iter 9, f(x) = -3.003576e-01, ||grad||_infty = 6.35e-05, 1.62e-02 Iter 10, f(x) = -3.003578e-01, ||grad||_infty = 7.69e-05, 1.76e-02 Iter 11, f(x) = -3.003579e-01, ||grad||_infty = 6.13e-05, 2.02e-02 Iter 12, f(x) = -3.003580e-01, ||grad||_infty = 1.96e-05, 2.14e-02 Iter 13, f(x) = -3.003580e-01, ||grad||_infty = 2.51e-05, 2.26e-02 Iter 14, f(x) = -3.003580e-01, ||grad||_infty = 3.02e-06, 2.38e-02 End Main Loop Final f: -3.0036e-01 Setup time: 0.0013 seconds Optimization time: 0.024 seconds Iterations: 14 Total iterations: 33 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F (adjusted): 4.60e-10, Score: 0.95 ***
By default, the method uses the randomized range finder (rrf) for the initial guess. This is the default initial guess and works well.
rng(1) [M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'init','rrf'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 5, f(x) = 4.265680e-02, ||grad||_infty = 1.81e-02, 6.42e-03 Iter 10, f(x) = 2.260594e-02, ||grad||_infty = 1.28e-02, 1.04e-02 Iter 15, f(x) = 1.363733e-02, ||grad||_infty = 1.88e-02, 1.48e-02 Iter 20, f(x) = 1.381487e-03, ||grad||_infty = 1.22e-02, 1.95e-02 Iter 25, f(x) = 5.283198e-05, ||grad||_infty = 1.45e-03, 2.32e-02 Iter 30, f(x) = 6.142064e-07, ||grad||_infty = 1.11e-04, 2.67e-02 Iter 35, f(x) = 1.823079e-08, ||grad||_infty = 1.78e-05, 3.09e-02 Iter 39, f(x) = 8.114700e-10, ||grad||_infty = 3.67e-06, 3.46e-02 End Main Loop Final f: 8.1147e-10 Setup time: 0.00097 seconds Optimization time: 0.035 seconds Iterations: 39 Total iterations: 86 Exit condition: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL. *** Final F: 8.11e-10, Score: 0.95 ***
If we run instead with a random Gaussian initial guess, we are unlikey to converge.
rng(77) [M,info,M0] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'init',@randn); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, subiters = 10 Begin Main Loop Iter 5, f(x) = 1.568859e+00, ||grad||_infty = 2.43e+03, 4.85e-03 Iter 10, f(x) = 3.002914e-01, ||grad||_infty = 1.21e-02, 9.12e-03 Iter 11, f(x) = 3.002914e-01, ||grad||_infty = 1.68e-03, 9.79e-03 End Main Loop Final f: 3.0029e-01 Setup time: 0.0032 seconds Optimization time: 0.01 seconds Iterations: 11 Total iterations: 24 Exit condition: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH. *** Final F: 3.00e-01, Score: 0.00 ***
Sometimes a random initialization does a little better if the convergence tolerances are tightened and the maximum iterations is increased. We also increase the memory in Limited-memory BFGS to see if that helps. It gets a little better, but still no where near the solution with the randomized range finder.
[M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'init',M0,... 'ftol',1e-14,'gtol',1e-8,'maxiters',10000,'m',25); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation L-BFGS-B Optimization (https://github.com/stephenbeckr/L-BFGS-B-C) Order, Size: 3, 100 Lower bound: -Inf, Upper bound: Inf Parameters: m=25, ftol=1e-14, gtol=1e-08, maxiters = 10000, subiters = 10 Begin Main Loop Iter 5, f(x) = 1.568859e+00, ||grad||_infty = 2.43e+03, 4.86e-03 Iter 10, f(x) = 3.002914e-01, ||grad||_infty = 8.83e-03, 8.24e-03 Iter 15, f(x) = 3.002914e-01, ||grad||_infty = 9.08e-04, 1.36e-02 Iter 20, f(x) = 3.002914e-01, ||grad||_infty = 1.39e-02, 1.75e-02 Iter 25, f(x) = 3.002914e-01, ||grad||_infty = 1.59e-01, 2.18e-02 Iter 30, f(x) = 3.002899e-01, ||grad||_infty = 1.81e+00, 2.69e-02 Iter 35, f(x) = 2.191333e-01, ||grad||_infty = 3.14e+02, 3.63e-02 Iter 40, f(x) = 1.203681e-01, ||grad||_infty = 4.07e+03, 4.29e-02 Iter 45, f(x) = 1.116589e-01, ||grad||_infty = 7.46e+02, 5.05e-02 Iter 50, f(x) = 1.041222e-01, ||grad||_infty = 4.91e+01, 6.26e-02 Iter 55, f(x) = 1.034506e-01, ||grad||_infty = 1.35e-03, 6.81e-02 Iter 60, f(x) = 1.034506e-01, ||grad||_infty = 4.54e-02, 7.33e-02 Iter 65, f(x) = 1.034504e-01, ||grad||_infty = 5.71e-01, 7.93e-02 Iter 70, f(x) = 1.034220e-01, ||grad||_infty = 6.39e+00, 8.47e-02 Iter 75, f(x) = 1.021516e-01, ||grad||_infty = 1.85e+02, 9.17e-02 Iter 80, f(x) = 8.415741e-02, ||grad||_infty = 5.40e+02, 9.89e-02 Iter 85, f(x) = 8.404990e-02, ||grad||_infty = 6.41e+00, 1.05e-01 Iter 90, f(x) = 8.404952e-02, ||grad||_infty = 1.19e-01, 1.12e-01 Iter 95, f(x) = 8.404951e-02, ||grad||_infty = 9.01e-03, 1.18e-01 Iter 97, f(x) = 8.404951e-02, ||grad||_infty = 5.18e-05, 1.21e-01 End Main Loop Final f: 8.4050e-02 Setup time: 0.00064 seconds Optimization time: 0.12 seconds Iterations: 97 Total iterations: 214 Exit condition: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH. *** Final F: 8.40e-02, Score: 0.07 ***
Options for CP-ISYM with L-BFGS optimization solver from Poblano
rng(1) [M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'method','lbfgs'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation
Poblano L-BFGS Optimization
Order, Size: 3, 100
Parameters: m=5, ftol=1e-10, gtol=1e-05, maxiters = 1000, maxsubiters=10000
Begin Main Loop
 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
     0         1       3.92027347       0.02425307
     5        18       0.04889771       0.00026180
    10        30       0.02227461       0.00010159
    15        45       0.01590084       0.00016813
    20        60       0.00156181       0.00008811
    25        70       0.00005713       0.00001458
    26        73       0.00001608       0.00000982
End Main Loop
Final f: 1.6075e-05
Setup time: 0.0018 seconds
Optimization time: 0.087 seconds
Iterations: 26
Total iterations: 73
Exit condition: Successful termination based on StopTol
		*** Final F: 1.61e-05, Score: 0.95 ***
Options for CP-ISYM with Quasi-Newton optimization solver from Optimization Toolbox
rng(1) [M,info] = cp_isym(X,r,'Xnormsqr','exact','printitn',5,'method','fminunc'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', info.f, score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation
Quasi-Newton Optimization (via Optimization Toolbox)
Order, Size: 3, 100
Parameters: gtol=1e-05, maxiters = 1000, maxsubiters=10000
Begin Main Loop
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
     0           1          3.92027                          1.95
     1           3         0.275394       0.164109         0.0628  
     2           7          0.14028        12.0063           0.11  
     3           8         0.128657              1         0.0684  
     4          10         0.085529            0.5         0.0396  
     5          11        0.0815813              1         0.0462  
     6          12        0.0798902              1        0.00739  
     7          14        0.0776973             10         0.0267  
     8          15        0.0750789              1         0.0245  
     9          17        0.0704473       0.480552          0.042  
    10          18        0.0670865              1         0.0368  
    11          19        0.0616269              1         0.0325  
    12          20         0.055824              1         0.0328  
    13          21        0.0544128              1         0.0339  
    14          22        0.0505937              1         0.0154  
    15          23        0.0481738              1         0.0168  
    16          24        0.0416669              1          0.054  
    17          25        0.0381286              1         0.0336  
    18          26        0.0335275              1         0.0159  
    19          27        0.0317987              1         0.0146  
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
    20          28        0.0301083              1         0.0111  
    21          29        0.0293649              1         0.0161  
    22          30         0.027827              1        0.00893  
    23          31        0.0265521              1        0.00975  
    24          32        0.0261249              1        0.00918  
    25          33        0.0251787              1        0.00625  
    26          34          0.02455              1        0.00806  
    27          35         0.023779              1        0.00911  
    28          36        0.0230934              1        0.00845  
    29          37        0.0224019              1        0.00844  
    30          38        0.0220964              1        0.00687  
    31          39        0.0218681              1         0.0049  
    32          40         0.021706              1        0.00375  
    33          41        0.0215136              1         0.0033  
    34          42        0.0213676              1        0.00304  
    35          43        0.0212558              1        0.00323  
    36          44        0.0211222              1        0.00409  
    37          45        0.0208738              1        0.00475  
    38          46        0.0205572              1        0.00656  
    39          47        0.0201352              1         0.0065  
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
    40          48        0.0196422              1        0.00387  
    41          49        0.0193932              1        0.00333  
    42          50        0.0191319              1        0.00872  
    43          51        0.0186628              1        0.00878  
    44          52         0.017685              1        0.00907  
    45          53        0.0169716              1        0.00861  
    46          54          0.01577              1        0.00994  
    47          56        0.0131656            0.5         0.0106  
    48          57        0.0117538              1         0.0153  
    49          58       0.00879083              1         0.0134  
    50          59       0.00839261              1          0.012  
    51          60       0.00394004              1        0.00878  
    52          61       0.00304266              1          0.013  
    53          62       0.00244168              1         0.0069  
    54          63       0.00225538              1        0.00627  
    55          64        0.0014968              1        0.00464  
    56          65       0.00105029              1        0.00366  
    57          66      0.000674101              1        0.00456  
    58          67      0.000456871              1         0.0042  
    59          68      0.000245825              1        0.00299  
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
    60          69      9.62095e-05              1         0.0024  
    61          70      4.02262e-05              1        0.00114  
    62          71      2.70099e-05              1       0.000664  
    63          72      2.21994e-05              1       0.000556  
    64          73      1.73964e-05              1       0.000481  
    65          74      1.28057e-05              1       0.000433  
    66          75      1.05719e-05              1       0.000388  
    67          76       9.4027e-06              1        0.00034  
    68          77       7.7716e-06              1       0.000308  
    69          78      5.32426e-06              1       0.000286  
    70          79      2.70776e-06              1       0.000259  
    71          80      1.17346e-06              1       0.000219  
    72          81      6.75895e-07              1       0.000149  
    73          82      5.27851e-07              1       0.000111  
    74          83       4.0086e-07              1       9.15e-05  
    75          84      2.47388e-07              1        6.9e-05  
    76          85      1.30409e-07              1       5.92e-05  
    77          86      7.85393e-08              1       4.16e-05  
    78          87      6.17683e-08              1       3.58e-05  
    79          88      5.12695e-08              1       2.56e-05  
Local minimum found.
Optimization completed because the size of the gradient is less than
the selected value of the optimality tolerance.
End Main Loop
Final f: 5.1269e-08
Setup time: 1.1 seconds
Optimization time: 0.63 seconds
Iterations: 79
Total iterations: 88
Exit condition: Gradient norm smaller than tolerance
		*** Final F: 5.13e-08, Score: 0.95 ***
Options for CP-ISYM with Adam stochastic optimization solver
When the number of observations is very large, a stochastic method may be faster. Our example is very small (p=1250), but we use it nonetheless just to demonstrate the capabilities of the stochastic method. The method is called by specifying the 'method' to be 'adam'.
rng(5) [M,info] = cp_isym(X,r,'method','adam'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: 1000 samples out of 1250 observations Gradient: using 100 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 100 Learning rate / decay / maxfails: 0.01 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f~ = 4.828625e+00 Epoch 1: f~ = -2.657838e-01, step = 0.01 Epoch 2: f~ = -2.832830e-01, step = 0.01 Epoch 3: f~ = -2.982487e-01, step = 0.01 Epoch 4: f~ = -2.985366e-01, step = 0.01 Epoch 5: f~ = -2.984525e-01, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 6: f~ = -3.001280e-01, step = 0.001 Epoch 7: f~ = -2.996104e-01, step = 0.001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f~: -3.0013e-01 Setup time: 0.01 seconds Optimization time: 0.40 seconds Number of epochs: 7 Total iterations: 700 Exit condition: Number of failed epochs > 1 *** Final F: 1.51e-04, Score: 0.95 ***
Since this problem is small, we can compute the function value exactly.
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact'); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 100 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 100 Learning rate / decay / maxfails: 0.01 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 3.595388e-02, step = 0.01 Epoch 2: f = 1.618031e-02, step = 0.01 Epoch 3: f = 1.507292e-03, step = 0.01 Epoch 4: f = 2.553210e-03, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 5: f = 1.624327e-04, step = 0.001 Epoch 6: f = 1.601540e-04, step = 0.001 Epoch 7: f = 3.981757e-04, step = 0.001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f: 1.6015e-04 Setup time: 0.01 seconds Optimization time: 0.41 seconds Number of epochs: 7 Total iterations: 700 Exit condition: Number of failed epochs > 1 *** Final F: 1.60e-04, Score: 0.96 ***
We can also change the number of samples in each stochastic gradient ('gsamp') and correspondingly increase the number of iterations per epoch ('subiters').
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact',... 'gsamp',10,'subiters',250); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 10 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 250 Learning rate / decay / maxfails: 0.01 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 2.239609e-02, step = 0.01 Epoch 2: f = 6.190717e-03, step = 0.01 Epoch 3: f = 1.955220e-02, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 4: f = 6.693318e-03, step = 0.001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f: 6.1907e-03 Setup time: 0.00 seconds Optimization time: 0.26 seconds Number of epochs: 4 Total iterations: 1000 Exit condition: Number of failed epochs > 1 *** Final F: 6.19e-03, Score: 0.89 ***
We can further improve by increasing the number of times we decrease the step length ('maxfails').
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact',... 'gsamp',10,'subiters',250,'maxfails',4); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 10 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 250 Learning rate / decay / maxfails: 0.01 0.1 4 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 2.239609e-02, step = 0.01 Epoch 2: f = 6.190717e-03, step = 0.01 Epoch 3: f = 1.955220e-02, step = 0.01, nfails = 1 (resetting to solution from last epoch) Epoch 4: f = 6.693318e-03, step = 0.001, nfails = 2 (resetting to solution from last epoch) Epoch 5: f = 1.111507e-03, step = 0.0001 Epoch 6: f = 3.658467e-04, step = 0.0001 Epoch 7: f = 1.958052e-04, step = 0.0001 Epoch 8: f = 2.450883e-04, step = 0.0001, nfails = 3 (resetting to solution from last epoch) Epoch 9: f = 1.802417e-04, step = 1e-05 Epoch 10: f = 1.692018e-04, step = 1e-05 Epoch 11: f = 1.604512e-04, step = 1e-05 Epoch 12: f = 1.384002e-04, step = 1e-05 Epoch 13: f = 1.418440e-04, step = 1e-05, nfails = 4 (resetting to solution from last epoch) Epoch 14: f = 1.356883e-04, step = 1e-06 Epoch 15: f = 1.352304e-04, step = 1e-06 Epoch 16: f = 1.333213e-04, step = 1e-06 Epoch 17: f = 1.320493e-04, step = 1e-06 Epoch 18: f = 1.304712e-04, step = 1e-06 Epoch 19: f = 1.304975e-04, step = 1e-06, nfails = 5 (resetting to solution from last epoch) End Main Loop Final f: 1.3047e-04 Setup time: 0.01 seconds Optimization time: 1.41 seconds Number of epochs: 19 Total iterations: 4750 Exit condition: Number of failed epochs > 4 *** Final F: 1.30e-04, Score: 0.97 ***
We also change the initial learning rate via 'rate'.
rng(5) [M,info] = cp_isym(X,r,'method','adam','fsamp','exact','Xnormsqr','exact','rate',1e-3); fprintf('\n\t\t*** Final F: %.2e, Score: %.2f ***\n\n', f_implicit(M,X,norm(X)^2), score(M,Mtrue));
CP-ISYM Implict Symmetric CP Optimzation Adam Stochastic Optimization Order, Size: 3, 100 Function: exact Gradient: using 100 samples out of 1250 observations Max iterations (epochs): 100 Iterations per epoch: 100 Learning rate / decay / maxfails: 0.001 0.1 1 Backup to best known solution after epoch fail? true beta1 / beta2 / epsilon: 0.9 0.999 1e-08 Begin Main Loop Epoch 0: f = 5.123527e+00 Epoch 1: f = 1.636029e-01, step = 0.001 Epoch 2: f = 6.960158e-02, step = 0.001 Epoch 3: f = 4.769470e-02, step = 0.001 Epoch 4: f = 3.871023e-02, step = 0.001 Epoch 5: f = 3.369702e-02, step = 0.001 Epoch 6: f = 3.012290e-02, step = 0.001 Epoch 7: f = 2.722983e-02, step = 0.001 Epoch 8: f = 2.466563e-02, step = 0.001 Epoch 9: f = 2.200866e-02, step = 0.001 Epoch 10: f = 1.999008e-02, step = 0.001 Epoch 11: f = 1.800170e-02, step = 0.001 Epoch 12: f = 1.660821e-02, step = 0.001 Epoch 13: f = 1.525271e-02, step = 0.001 Epoch 14: f = 1.399967e-02, step = 0.001 Epoch 15: f = 1.185572e-02, step = 0.001 Epoch 16: f = 9.517354e-03, step = 0.001 Epoch 17: f = 6.242648e-03, step = 0.001 Epoch 18: f = 3.505938e-03, step = 0.001 Epoch 19: f = 1.700016e-03, step = 0.001 Epoch 20: f = 7.406634e-04, step = 0.001 Epoch 21: f = 4.796099e-04, step = 0.001 Epoch 22: f = 4.453608e-04, step = 0.001 Epoch 23: f = 1.749952e-04, step = 0.001 Epoch 24: f = 1.949576e-04, step = 0.001, nfails = 1 (resetting to solution from last epoch) Epoch 25: f = 6.149062e-05, step = 0.0001 Epoch 26: f = 7.074476e-05, step = 0.0001, nfails = 2 (resetting to solution from last epoch) End Main Loop Final f: 6.1491e-05 Setup time: 0.01 seconds Optimization time: 1.13 seconds Number of epochs: 26 Total iterations: 2600 Exit condition: Number of failed epochs > 1 *** Final F: 6.15e-05, Score: 0.96 ***
