Large scale nonlinear optimization with fmincon in Scilab vs Matlab

Hello,

I have recently pointed out the new features of Scilab allowing the resolution of large scale systems of nonlinear equations:

https://scilab.discourse.group/t/using-kinsol-to-solve-a-nonlinear-pde

In the above example the good performances of Scilab are due to the simultaneous use of the kinsol Sundials solver and of the sparse approximation of the Jacobian using the ColPack library, both new features of Scilab 2024.0.

This minimal surface problem can also be stated as an optimization problem, which allows a more global convergence for highly curved surfaces and/or more complicated topologies. But in order to solve the problem it needs an optimization method allowing to solve large scale problems, which in our case implies to take into account a sparse Hessian and its approximation. These two features can be considered in Scilab thanks to the toolboxes Sci-IPOpt, and its Matlab-like wrapper fmincon which is used in the below script. To run this example you will have to install fmincon by typing on Scilab’s command line:

--> atomsSystemUpdate
--> atomsInstall fmincon

then restart Scilab.

Problem statement

We consider the same minimal surface as the one previously obtained by solving the Euler equations with kinsol, but this time we consider a triangulation of the domain [0,1]\times[0,1] allowing to compute the surface area for a given height field by adding triangles areas. The exact gradient is computed and the Hessian of fungrad is approximated by taking into account its sparsity pattern (the pattern is the same as the Jacobian pattern of the Euler equations):

function [S,dsdf] = fungrad(f)
    // compute the area of surface and its gradient using the triangulation
    z2mz1 = I21*f; z3mz1 = I31*f;
    px =  y2my1.*z3mz1 - y3my1.*z2mz1;
    py = -x2mx1.*z3mz1 + x3mx1.*z2mz1;
    sq = sqrt(px.*px+py.*py+pz2);
    S = sum(sq);
    if argn(1) > 1
        // compute gradient by reverse differentiation of S
        dsdpx = px./sq; dsdpy = py./sq;
        dsdz3mz1 =  dsdpx.*y2my1 - dsdpy.*x2mx1;
        dsdz2mz1 = -dsdpx.*y3my1 + dsdpy.*x3mx1;
        dsdf = (dsdz2mz1'*I21 + dsdz3mz1'*I31)';
    end
end

// define the mesh of the (x,y) domain
n = 40;
N = n^2;
x = linspace(0,1,n);
[X,Y] = meshgrid(x,x);
X = X(:);
Y = Y(:);
[tri,bdy] = mesh2d(X(:),Y(:));
bdy($)=[];

// helper matrices and vectors for normals computation
nt = size(tri,2);
I21 = sparse([1:nt 1:nt; tri(1,:) tri(2,:)]',[-ones(1,nt), ones(1,nt)],[nt,N])
I31 = sparse([1:nt 1:nt; tri(1,:) tri(3,:)]',[-ones(1,nt), ones(1,nt)],[nt,N])
x2mx1 = I21*X; x3mx1 = I31*X; 
y2my1 = I21*Y;  y3my1 = I31*Y;
pz2 = (x2mx1.*y3my1-y2my1.*x3mx1).^2;

// Hessian pattern
d1x=sparse(ones(n-1,1)); d0x=sparse(ones(n,1));
grad = (-diag(d1x,-1) + diag(d1x,1) );
gradx = grad .*. speye(n,n); grady = speye(n,n) .*. grad;
lap = (diag(d1x,-1)+diag(d1x,1)-2*diag(d0x));
lapxy = lap .*. speye(n,n) + speye(n,n) .*. lap+gradx*grady;

// Dirichlet boundary condition
cnd = sign(sin(4*(X+Y)));
cnd = 0.5*sign(sin(2*%pi*X).*sin(2*%pi*Y));
I = speye(N,N);

opt = optimoptions("fmincon");
opt.Display="iter";   
opt.OptimalityTolerance = 1e-5;
opt.SpecifyObjectiveGradient=1;
opt.HessianApproximation = "finite-difference";
opt.HessPattern=lapxy;

problem = struct();
problem.objective = fungrad;
problem.x0 = zeros(N,1);
problem.Aeq = I(bdy,:);
problem.beq = cnd(bdy);
problem.options = opt;

tic
[f,S] = fmincon(problem);
disp(sprintf("elapsed time = %g seconds\n",toc()))

clf
mesh(x,x,matrix(f,n,n))

We get the following output (Scilab 2024.0 on a Mac mini M1):

 Iter  Func-count  Function value   Feasibility   Step Length     Step Norm   First-order
                                                                               optimality
    0           1    2.000000e+00    5.0000e-01    0.0000e+00    0.0000e+00    5.0000e-01
    1           2    3.258430e+00    1.2326e-32    1.0000e+00    5.0000e-01    1.5791e+00
    2           3    3.218325e+00    1.4483e-31    1.0000e+00    2.0581e-01    2.0205e-02
    3           4    3.213194e+00    1.7333e-33    1.0000e+00    4.8281e-02    8.0155e-03
    4           5    3.212804e+00    1.9259e-34    1.0000e+00    1.5123e-02    1.3365e-03
    5           6    3.212799e+00    4.8148e-35    1.0000e+00    2.1899e-03    2.8175e-05
    6           7    3.212799e+00    6.5828e-37    1.0000e+00    4.3534e-05    1.2123e-08

Termination message: Optimal solution found

Elapsed time = 0.082521 seconds

Matlab code

We can already note that the optimal surface is computed between 2 and 3 times faster than with the Euler equation, but we will now focus on the comparison of these results with those obtained with a very similar code in Matlab.
In Scilab’s fmincon we can only use the interior-point algorithm of IPopt but in Matlab’s fmincon two large scale algorithms are available: interior-point and trust-region-reflective. Strangely, the latter was the faster of both, but this may due to the underlying search direction computation algorithm: unlike IPopt which uses a sparse solver (MUMPS) for the linear systems, the trust-region-reflective algorithm of Matlab’s fmincon uses Conjugate Gradient iterations, which may not be the best choice because the proposed preconditioner tuning is very basic.

The Matlab code is the following:

% define the mesh of the (x,y) domain
n = 400;
N = n^2;
x = linspace(0,1,n);
[X,Y] = meshgrid(x,x);
X = X(:);
Y = Y(:);
dt = delaunayTriangulation(X,Y);
bdy = convexHull(dt);
bdy(end)=[];
tri = dt.ConnectivityList';

% helper matrices and vectors for normals computation
nt = size(tri,2);
I21 = sparse([1:nt 1:nt],[tri(1,:) tri(2,:)],[-ones(1,nt), ones(1,nt)],nt,N);
I31 = sparse([1:nt 1:nt],[tri(1,:) tri(3,:)],[-ones(1,nt), ones(1,nt)],nt,N);
x2mx1 = I21*X; x3mx1 = I31*X; 
y2my1 = I21*Y; y3my1 = I31*Y;
pz2 = (x2mx1.*y3my1-y2my1.*x3mx1).^2;

% Hessian pattern
d1x=sparse(ones(n-1,1)); d0x=sparse(ones(n,1));
grad = (-diag(d1x,-1) + diag(d1x,1) );
gradx = kron(grad,speye(n,n)); grady = kron(speye(n,n),grad);
lap = diag(d1x,-1)+diag(d1x,1)-2*diag(d0x);
lapxy = kron(lap,speye(n,n)) + kron(speye(n,n),lap)+gradx*grady;

% Dirichlet boundary condition
cnd = 0.5*sign(sin(2*pi*X).*sin(2*pi*Y));I = speye(N,N);
opt = optimoptions("fmincon");
opt.Algorithm = "Trust-Region-Reflective";
opt.Display = "iter";   
opt.OptimalityTolerance = 1e-5;
opt.SpecifyObjectiveGradient = true;
opt.HessianApproximation = "finite-difference";
opt.HessPattern = lapxy;
opt.SubproblemAlgorithm = "cg";

problem = struct();
problem.solver = "fmincon";
problem.objective = @(f) fungrad(f,I21,I31,x2mx1,x3mx1,y2my1,y3my1,pz2);
problem.x0 = zeros(N,1);
problem.Aeq = I(bdy,:);
problem.beq = cnd(bdy);
problem.options = opt;

tic;
[f,S] = fmincon(problem);
toc

mesh(x,x,reshape(f,n,n))

function [S,dsdf] = fungrad(f,I21,I31,x2mx1,x3mx1,y2my1,y3my1,pz2)
    % compute the area surface and its gradient using the triangulation
    z2mz1 = I21*f; z3mz1 = I31*f;
    px =  y2my1.*z3mz1 - y3my1.*z2mz1;
    py = -x2mx1.*z3mz1 + x3mx1.*z2mz1;
    sq = sqrt(px.*px+py.*py+pz2);
    S = sum(sq);
    if nargout > 1
        % compute gradient by reverse differentiation of S
        dsdpx = px./sq; dsdpy = py./sq;
        dsdz3mz1 =  dsdpx.*y2my1 - dsdpy.*x2mx1;
        dsdz2mz1 = -dsdpx.*y3my1 + dsdpy.*x3mx1;
        dsdf = (dsdz2mz1'*I21 + dsdz3mz1'*I31)';
    end
end

and its output for the same discretization (n=40):


                                Norm of      First-order 
 Iteration        f(x)          step          optimality   CG-iterations
     0            3.88676                        0.0875                
     1            3.67648        8.62723          0.127          59
     2            3.40117        2.15681           0.15          25
     3            3.40117        2.42382           0.15           7
     4            3.33628       0.539202          0.165           0
     5            3.29896         0.1348          0.134           4
     6            3.27428       0.269601         0.0418          34
     7             3.2514       0.539202         0.0193          38
     8            3.22442         1.0784        0.00448          39
     9            3.21315        1.14838        0.00106          40
    10            3.21281       0.127864        0.00029          18
    11             3.2128      0.0487419       2.52e-05          40
    12             3.2128     0.00107931       2.56e-06          24

Local minimum found.

Optimization completed because the size of the gradient is less than
the value of the optimality tolerance.

<stopping criteria details>
Elapsed time is 0.104935 seconds.

For this discretization level n=40 (1600 unknowns) we obtained a similar performance for Scilab and Matlab, but when we use larger values of n (up to n=1000, 1e6 unknowns), then Scilab is clearly faster (2024.0 macOS version on a Mac mini M1):

n Scilab Matlab
Elapsed time (s) Final value Elapsed time (s) Final value
40 0.08 3.2127986 0.1 3.2127986
100 0.36 3.2210509 0.59 3.2210509
150 0.82 3.2229525 2.02 3.2229525
200 1.49 3.2239155 5.18 3.2239155
300 3.19 3.2248879 15.41 3.2248879
400 6.21 3.2253780 36.28 3.2253781
500 10.83 3.2256734 114.52 3.2256739
1000 110.67 3.2262754 1297.31 3.2262703

As I already said, the comparison may or may not be considered fair since we cannot strictly use the same algorithms, but the fastest one has been used for both softwares and these results show that Scilab is a serious alternative to Matlab when this type of problems are considered.

2 Likes