Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527

Closed
amontoison opened this issue Jul 22, 2024 · 17 comments
Closed

[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527

amontoison opened this issue Jul 22, 2024 · 17 comments
Labels
Submodule: Nonlinear About the Nonlinear submodule

Comments

@amontoison
Copy link
Contributor

amontoison commented Jul 22, 2024

As discussed with @mlubin during JuMP-dev 2024, we recently observed with Guillaume Dalle (@gdalle) that JuMP detects additional non-zeros on the diagonal of the Hessian than what is needed.

We also observed other discrepancies:

I don't think that it's a bug, just a part of the code that was probably not optimized for corner cases.

@odow
Copy link
Member

odow commented Jul 22, 2024

I think this is expected. We don't try to compress the non-zero terms, even in trivial cases. Is there any evidence that this is a performance concern?

@odow odow added Project: next-gen nonlinear support Issues relating to nonlinear support Submodule: Nonlinear About the Nonlinear submodule and removed Project: next-gen nonlinear support Issues relating to nonlinear support labels Jul 22, 2024
@odow odow changed the title [NLP] Sparsity pattern of Hessian [Nonlinear] sparsity pattern of Hessian Jul 22, 2024
@odow
Copy link
Member

odow commented Jul 22, 2024

As a trivial example, a problem with one variable can have > 1 element in the Hessian:

julia> using JuMP, Ipopt

julia> model = Model(Ipopt.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt

julia> @variable(model, x)
x

julia> @objective(model, Min, sin(x))
sin(x)

julia> @constraint(model, sin(x) >= 0)
sin(x) - 0.0  0

julia> optimize!(model)
This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        1
Number of nonzeros in Lagrangian Hessian.............:        2

@amontoison
Copy link
Contributor Author

amontoison commented Jul 22, 2024

The issue is not that you don't compress the non-zero terms; it is that you detect non-zero terms where there are zeros in the Hessian.

If these additional nnz and on the diagonal, it doesn't impact the coloring and thus the performance.
You just allocate more storage than needed (+n in the worst case).

If the detected non-zeros are not on the diagonal, you might require more colors (and thus more directional derivatives) to retrieve all non-zeros of the Hessian of the Lagrangian.

@odow
Copy link
Member

odow commented Jul 22, 2024

Do you have an example?

@odow
Copy link
Member

odow commented Jul 22, 2024

Is the claim something like this:

julia> import MathOptInterface as MOI

julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)

julia> model = MOI.Nonlinear.Model();

julia> objective = :(sin($(x[1])) + $(x[2])^1)
:(sin(MOI.VariableIndex(1)) + MOI.VariableIndex(2) ^ 1)

julia> MOI.Nonlinear.set_objective(model, objective)

julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);

julia> MOI.initialize(evaluator, [:Hess])

julia> MOI.hessian_lagrangian_structure(evaluator)
2-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)

Where (2, 2) in the Hessian is always 0.0?

Are there examples there this matters?

@amontoison
Copy link
Contributor Author

amontoison commented Jul 22, 2024

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i]  x[i = 1:8]  uvar[i], start = x0[i])

@constraint(nlp, 1 - 0.0025 * (x[4] + x[6])  0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4])  0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5])  0)
@NLconstraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333  0)
@NLconstraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4]  0)
@NLconstraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5]  0)

@objective(nlp, Min, x[1] + x[2] + x[3])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(row, cols, vals)
julia> sparse(row, cols, vals)
8×8 SparseMatrixCSC{Int64, Int64} with 13 stored entries:
 1              
   1            
     1          
   1    1        
     1    1      
 1          1    
   1          1  
     1          1

@amontoison
Copy link
Contributor Author

amontoison commented Jul 22, 2024

It seems that the new nonlinear API fixed something 🤔

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i]  x[i = 1:8]  uvar[i], start = x0[i])

@constraint(nlp, 1 - 0.0025 * (x[4] + x[6])  0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4])  0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5])  0)
@constraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333  0)
@constraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4]  0)
@constraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5]  0)

@objective(nlp, Min, x[1] + x[2] + x[3])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
sparse(rows, cols, vals, 8, 8)
8×8 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
               
               
               
   1            
     1          
 1              
   1            
     1          

Benoît helped me to support it today and I never tested it before tonight.

Update: I don't use the non linear API in that case, the constraints are quadratic.

@odow
Copy link
Member

odow commented Jul 22, 2024

Your @NLconstraints are actually quadratic, so you don't need to do AD on them.

Here's the underlying issue:

julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)

julia> model = MOI.Nonlinear.Model();

julia> MOI.Nonlinear.set_objective(model, :($(x[1]) * $(x[2])))

julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);

julia> MOI.initialize(evaluator, [:Hess])

julia> MOI.hessian_lagrangian_structure(evaluator)
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)
 (2, 1)

@odow
Copy link
Member

odow commented Jul 22, 2024

So I assume somewhere we are detecting Hessian sparsity and if op(x, y) then we are adding (x, x), (x, y), and (y, y). We should special-case *.

@odow odow changed the title [Nonlinear] sparsity pattern of Hessian [Nonlinear] sparsity pattern of Hessian with :(x * y) Jul 22, 2024
@amontoison
Copy link
Contributor Author

Nice catch Oscar 👍
I will try to isolate an example tomorrow with additional non-zeros that are not on the diagonal.

@amontoison
Copy link
Contributor Author

amontoison commented Jul 22, 2024

@odow I found a simple example with a coefficient off-diagonal:

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()

x0 = [1745, 12000, 110, 3048, 1974, 89.2, 92.8, 8, 3.6, 145]
lvar = [0.00001, 0.00001, 0.00001, 0.00001, 0.00001, 85, 90, 3, 1.2, 145]
uvar = [2000, 16000, 120, 5000, 2000, 93, 95, 12, 4, 162]
@variable(nlp, lvar[i]  x[i = 1:10]  uvar[i], start = x0[i])

a = 0.99
b = 0.9

@expression(nlp, g1, 35.82 - 0.222 * x[10] - b * x[9])
@expression(nlp, g2, -133 + 3 * x[7] - a * x[10])
@expression(nlp, g5, 1.12 * x[1] + 0.13167 * x[1] * x[8] - 0.00667 * x[1] * x[8]^2 - a * x[4])
@expression(nlp, g6, 57.425 + 1.098 * x[8] - 0.038 * x[8]^2 + 0.325 * x[6] - a * x[7])

@constraint(nlp, g1  0)
@constraint(nlp, g2  0)
@constraint(nlp, -g1 + x[9] * (1 / b - b)  0)
@constraint(nlp, -g2 + (1 / a - a) * x[10]  0)
@constraint(nlp, g5  0)
@constraint(nlp, g6  0)
@constraint(nlp, -g5 + (1 / a - a) * x[4]  0)
@constraint(nlp, -g6 + (1 / a - a) * x[7]  0)
@constraint(nlp, 1.22 * x[4] - x[1] - x[5] == 0)
@constraint(nlp, 98000 * x[3] / (x[4] * x[9] + 1000 * x[3]) - x[6] == 0)
@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)

@objective(nlp, Min, 5.04 * x[1] + 0.035 * x[2] + 10 * x[3] + 3.36 * x[5] - 0.063 * x[4] * x[7])
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)
10×10 SparseMatrixCSC{Int64, Int64} with 16 stored entries:
 3                  
 1  1                
     1              
     1  2            
 1  1      1          
                   
       1      1      
 2              4    
     1  1          1  
                   

I don't understand why H[5,2] is a non-zero.

I tried:

@constraint(nlp, x[2] / x[1] + x[5] / x[1] - x[8] == 0)

instead of

@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)

and H[5,2] is not considered anymore as a non-zero.

@amontoison
Copy link
Contributor Author

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()

@variable(nlp,  x[i = 1:10])
@constraint(nlp, sum(x[i] for i = 1:10) / x[1]  == 0)
@objective(nlp, Min, x[1]^4)

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)

The Hessian is completely dense 😨

10×10 SparseMatrixCSC{Int64, Int64} with 55 stored entries:
 2                  
 1  1                
 1  1  1              
 1  1  1  1            
 1  1  1  1  1          
 1  1  1  1  1  1        
 1  1  1  1  1  1  1      
 1  1  1  1  1  1  1  1    
 1  1  1  1  1  1  1  1  1  
 1  1  1  1  1  1  1  1  1  1

@odow
Copy link
Member

odow commented Jul 23, 2024

I think this is even somewhat expected from our sparsity algorithm. We don't recognize that sum(x) / y is decomposable. The sparsity algorithm trades simplicity for some false positives. I guess the sparseconnectivitytracer trades exactness for some cases where it is much slower than JuMP. I can see how this one might be problematic:
https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/458d8141bbe7ffc4552c713ec755241de7b4fa15/src/PureJuMP/tetra.jl#L37-L62

I'm still interested in seeing a model where our choice makes an important runtime difference (other than artificial examples like the one above).

@odow
Copy link
Member

odow commented Sep 18, 2024

Perhaps this issue is really asking for better insight on how different formulations might perform:

julia> using NLPModels, NLPModelsJuMP, JuMP, SparseArrays

julia> function nlp_hessian(model)
           nlp = MathOptNLPModel(model)
           rows, cols = hess_structure(nlp)
           n = num_variables(model)
           return SparseArrays.sparse(rows, cols, ones(Int, length(rows)), n, n)
       end
nlp_hessian (generic function with 5 methods)

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @constraint(model, sum(x) / x[1]  == 0)
           nlp_hessian(model)
       end
4×4 SparseMatrixCSC{Int64, Int64} with 10 stored entries:
 1      
 1  1    
 1  1  1  
 1  1  1  1

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @constraint(model, sum(x[i] / x[1] for i in 1:4) == 0)
           nlp_hessian(model)
       end
4×4 SparseMatrixCSC{Int64, Int64} with 7 stored entries:
 1      
 1  1    
 1    1  
 1      1

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @variable(model, y)
           @constraint(model, y == sum(x))
           @constraint(model, y / x[1] == 0)
           nlp_hessian(model)
       end
5×5 SparseMatrixCSC{Int64, Int64} with 3 stored entries:
 1        
         
         
         
 1        1

@gdalle
Copy link

gdalle commented Sep 19, 2024

That's a really cool insight actually!

@odow
Copy link
Member

odow commented Dec 17, 2024

This came up again in https://discourse.julialang.org/t/too-many-nonzeroes-for-nlp-in-jump-and-examodels/123901

@odow
Copy link
Member

odow commented Jan 14, 2025

I think I'm going to close this issue. There's certainly scope for writing a new AD backend, but I'm very reticent to go making changes to our existing one. The case of x * y isn't the issue because the false non-zeros appear on the diagonal (we think that H[x, x] is non-zero). The larger question of detecting sparsity with separable affine expressions is much more complicated to fix with the current design.

jump-dev/JuMP.jl#3664 can add support for analyzing the Hessian structure.

@odow odow closed this as completed Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Submodule: Nonlinear About the Nonlinear submodule
Development

No branches or pull requests

3 participants