-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527
Comments
I think this is expected. We don't try to compress the non-zero terms, even in trivial cases. Is there any evidence that this is a performance concern? |
As a trivial example, a problem with one variable can have > 1 element in the Hessian: julia> using JuMP, Ipopt
julia> model = Model(Ipopt.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt
julia> @variable(model, x)
x
julia> @objective(model, Min, sin(x))
sin(x)
julia> @constraint(model, sin(x) >= 0)
sin(x) - 0.0 ≥ 0
julia> optimize!(model)
This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 1
Number of nonzeros in Lagrangian Hessian.............: 2 |
The issue is not that you don't compress the non-zero terms; it is that you detect non-zero terms where there are zeros in the Hessian. If these additional nnz and on the diagonal, it doesn't impact the coloring and thus the performance. If the detected non-zeros are not on the diagonal, you might require more colors (and thus more directional derivatives) to retrieve all non-zeros of the Hessian of the Lagrangian. |
Do you have an example? |
Is the claim something like this: julia> import MathOptInterface as MOI
julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
MOI.VariableIndex(1)
MOI.VariableIndex(2)
julia> model = MOI.Nonlinear.Model();
julia> objective = :(sin($(x[1])) + $(x[2])^1)
:(sin(MOI.VariableIndex(1)) + MOI.VariableIndex(2) ^ 1)
julia> MOI.Nonlinear.set_objective(model, objective)
julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);
julia> MOI.initialize(evaluator, [:Hess])
julia> MOI.hessian_lagrangian_structure(evaluator)
2-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2) Where Are there examples there this matters? |
using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])
@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@NLconstraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@NLconstraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@NLconstraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)
@objective(nlp, Min, x[1] + x[2] + x[3])
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(row, cols, vals) julia> sparse(row, cols, vals)
8×8 SparseMatrixCSC{Int64, Int64} with 13 stored entries:
1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ 1 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ 1 ⋅ ⋅ ⋅
1 ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ 1 ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ 1 |
It seems that the new nonlinear API fixed something 🤔 using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])
@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@constraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@constraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@constraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)
@objective(nlp, Min, x[1] + x[2] + x[3])
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
sparse(rows, cols, vals, 8, 8) 8×8 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅
1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ Benoît helped me to support it today and I never tested it before tonight. Update: I don't use the non linear API in that case, the constraints are quadratic. |
Your Here's the underlying issue: julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
MOI.VariableIndex(1)
MOI.VariableIndex(2)
julia> model = MOI.Nonlinear.Model();
julia> MOI.Nonlinear.set_objective(model, :($(x[1]) * $(x[2])))
julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);
julia> MOI.initialize(evaluator, [:Hess])
julia> MOI.hessian_lagrangian_structure(evaluator)
3-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
(2, 1) |
So I assume somewhere we are detecting Hessian sparsity and if |
Nice catch Oscar 👍 |
@odow I found a simple example with a coefficient off-diagonal: using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
x0 = [1745, 12000, 110, 3048, 1974, 89.2, 92.8, 8, 3.6, 145]
lvar = [0.00001, 0.00001, 0.00001, 0.00001, 0.00001, 85, 90, 3, 1.2, 145]
uvar = [2000, 16000, 120, 5000, 2000, 93, 95, 12, 4, 162]
@variable(nlp, lvar[i] ≤ x[i = 1:10] ≤ uvar[i], start = x0[i])
a = 0.99
b = 0.9
@expression(nlp, g1, 35.82 - 0.222 * x[10] - b * x[9])
@expression(nlp, g2, -133 + 3 * x[7] - a * x[10])
@expression(nlp, g5, 1.12 * x[1] + 0.13167 * x[1] * x[8] - 0.00667 * x[1] * x[8]^2 - a * x[4])
@expression(nlp, g6, 57.425 + 1.098 * x[8] - 0.038 * x[8]^2 + 0.325 * x[6] - a * x[7])
@constraint(nlp, g1 ≥ 0)
@constraint(nlp, g2 ≥ 0)
@constraint(nlp, -g1 + x[9] * (1 / b - b) ≥ 0)
@constraint(nlp, -g2 + (1 / a - a) * x[10] ≥ 0)
@constraint(nlp, g5 ≥ 0)
@constraint(nlp, g6 ≥ 0)
@constraint(nlp, -g5 + (1 / a - a) * x[4] ≥ 0)
@constraint(nlp, -g6 + (1 / a - a) * x[7] ≥ 0)
@constraint(nlp, 1.22 * x[4] - x[1] - x[5] == 0)
@constraint(nlp, 98000 * x[3] / (x[4] * x[9] + 1000 * x[3]) - x[6] == 0)
@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)
@objective(nlp, Min, 5.04 * x[1] + 0.035 * x[2] + 10 * x[3] + 3.36 * x[5] - 0.063 * x[4] * x[7]) nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10) 10×10 SparseMatrixCSC{Int64, Int64} with 16 stored entries:
3 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 1 ⋅ ⋅ 1 ⋅ ⋅ ⋅
2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 4 ⋅ ⋅
⋅ ⋅ 1 1 ⋅ ⋅ ⋅ ⋅ 1 ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ I don't understand why I tried: @constraint(nlp, x[2] / x[1] + x[5] / x[1] - x[8] == 0) instead of @constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0) and |
using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
@variable(nlp, x[i = 1:10])
@constraint(nlp, sum(x[i] for i = 1:10) / x[1] == 0)
@objective(nlp, Min, x[1]^4)
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10) The Hessian is completely dense 😨 10×10 SparseMatrixCSC{Int64, Int64} with 55 stored entries:
2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 1 1 ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 1 1 1 ⋅ ⋅ ⋅ ⋅
1 1 1 1 1 1 1 ⋅ ⋅ ⋅
1 1 1 1 1 1 1 1 ⋅ ⋅
1 1 1 1 1 1 1 1 1 ⋅
1 1 1 1 1 1 1 1 1 1 |
I think this is even somewhat expected from our sparsity algorithm. We don't recognize that I'm still interested in seeing a model where our choice makes an important runtime difference (other than artificial examples like the one above). |
Perhaps this issue is really asking for better insight on how different formulations might perform: julia> using NLPModels, NLPModelsJuMP, JuMP, SparseArrays
julia> function nlp_hessian(model)
nlp = MathOptNLPModel(model)
rows, cols = hess_structure(nlp)
n = num_variables(model)
return SparseArrays.sparse(rows, cols, ones(Int, length(rows)), n, n)
end
nlp_hessian (generic function with 5 methods)
julia> begin
model = Model()
@variable(model, x[1:4])
@constraint(model, sum(x) / x[1] == 0)
nlp_hessian(model)
end
4×4 SparseMatrixCSC{Int64, Int64} with 10 stored entries:
1 ⋅ ⋅ ⋅
1 1 ⋅ ⋅
1 1 1 ⋅
1 1 1 1
julia> begin
model = Model()
@variable(model, x[1:4])
@constraint(model, sum(x[i] / x[1] for i in 1:4) == 0)
nlp_hessian(model)
end
4×4 SparseMatrixCSC{Int64, Int64} with 7 stored entries:
1 ⋅ ⋅ ⋅
1 1 ⋅ ⋅
1 ⋅ 1 ⋅
1 ⋅ ⋅ 1
julia> begin
model = Model()
@variable(model, x[1:4])
@variable(model, y)
@constraint(model, y == sum(x))
@constraint(model, y / x[1] == 0)
nlp_hessian(model)
end
5×5 SparseMatrixCSC{Int64, Int64} with 3 stored entries:
1 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
1 ⋅ ⋅ ⋅ 1 |
That's a really cool insight actually! |
I think I'm going to close this issue. There's certainly scope for writing a new AD backend, but I'm very reticent to go making changes to our existing one. The case of jump-dev/JuMP.jl#3664 can add support for analyzing the Hessian structure. |
As discussed with @mlubin during JuMP-dev 2024, we recently observed with Guillaume Dalle (@gdalle) that JuMP detects additional non-zeros on the diagonal of the Hessian than what is needed.
We also observed other discrepancies:
I don't think that it's a bug, just a part of the code that was probably not optimized for corner cases.
The text was updated successfully, but these errors were encountered: