Skip to content

Update README.md #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
324 changes: 319 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,320 @@
| [![](https://img.shields.io/badge/Developed_by-PSOR_Lab-342674)](https://psor.uconn.edu/) | [![Build Status](https://github.com/PSORLab/SourceCodeMcCormick.jl/workflows/CI/badge.svg?branch=master)](https://github.com/PSORLab/SourceCodeMcCormick.jl/actions?query=workflow%3ACI) [![codecov](https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl)|


This package uses source-code generation to create CUDA kernels that return evaluations of McCormick-based
relaxations. Expressions composed of `Symbolics.jl`-type variables can be passed into the main `SourceCodeMcCormick.jl`
(`SCMC`) function, `kgen`, after which the expressions are factored, algebraic rearrangements and other
modifications are applied as necessary, and a new, custom function is written that calculates inclusion
monotonic interval extensions, convex and concave relaxations, and subgradients of the convex and concave
relaxations of the original expression. These new custom functions are meant to be used in, e.g., a branch-and-bound
routine where the optimizer would like to calculate relaxations for many nodes simultaneously using GPU
resources. They are designed to be used with `CUDA.CuArray{Float64}` objects, with 64-bit (double-precision)
numbers recommended to maintain accuracy.


## Basic Functionality

The primary user-facing function is `kgen` ("kernel generator"). The `kgen` function returns a source-code-generated
function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
convex relaxation subgradients, and concave relaxation subgradients, for an input symbolic expression. Inputs
to the newly generated function are expected to be an output placeholder, followed by a separate `CuArray` for
each input variable, sorted in alphabetical order. The "input" variables must be `(m,3)`-dimensional `CuArray`s,
with the columns corresponding to points where evaluations are requested, lower bounds, and upper bounds,
respectively. The "output" placeholder must be `(m,4+2n)`-dimensional, where `n` is the dimensionality of the
expression. The columns will hold, respectively, the convex and concave relaxations, lower and upper bounds of
an interval extension, a subgradient of the convex relaxation, and a subgradient of the concave relaxation.
Each row `m` in the inputs and output are independent of one another, allowing calculations of relaxation information
at multiple points on (potentially) different domains. A demonstration of how to use `kgen` is shown here,
with the output compared to the multiple-dispatch-based `McCormick.jl`:

```julia
using SourceCodeMcCormick, Symbolics, CUDA
Symbolics.@variables x, y
expr = exp(x/y) - (x*y^2)/(y+1)
new_func = kgen(expr)
x_in = CuArray([1.0 0.5 3.0;])
y_in = CuArray([0.7 0.1 2.0;])
OUT = CUDA.zeros(Float64, 1, 8)

using McCormick
xMC = MC{2,NS}(1.0, Interval(0.5, 3.0), 1)
yMC = MC{2,NS}(0.7, Interval(0.1, 2.0), 2)


julia> new_func(OUT, x_in, y_in) # SourceCodeMcCormick.jl
CUDA.HostKernel for f_3zSdMo7LFdg_1(CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1})

julia> Array(OUT)
1×8 Matrix{Float64}:
0.228368 2.96348e12 -9.62507 1.06865e13 -2.32491 -3.62947 3.59209e12 -8.98023e11

julia> exp(xMC/yMC) - (xMC*yMC^2)/(yMC+1) # McCormick.jl
MC{2, NS}(0.22836802303235793, 2.963476144457207e12, [-9.62507, 1.06865e+13], [-2.3249068975747305, -3.6294726270892967], [3.592092296310309e12, -8.980230740778097e11], false)
```

In this example, the symbolic expression `exp(x/y) - (x*y^2)/(y+1)` is passed into `kgen`, which returns the
function `new_func`. Passing inputs representing the values and domain where calculations are desired for `x`
and `y` (i.e., evaluating `x` at `1.0` in the domain `[0.5, 3.0]` and `y` at `0.7` in the domain `[0.1, 2.0]`)
into `new_func` returns pointwise evaluations of {cv, cc, lo, hi, [cvgrad]..., [ccgrad]...} for the original
expression of `exp(x/y) - (x*y^2)/(y+1)` on the specified domain of `x` and `y`.

Evaluating large numbers of points simultaneously using a GPU allows for faster relaxation calculations than
evaluating points individually using a CPU. This is demonstrated in the following example:

```julia
using SourceCodeMcCormick, Symbolics, CUDA, McCormick, BenchmarkTools
Symbolics.@variables x, y
expr = exp(x/y) - (x*y^2)/(y+1)
new_func = kgen(expr)

# CuArray values for SCMC timing
x_in = hcat(2.5.*CUDA.rand(Float64, 8192) .+ 0.5,
0.5.*CUDA.ones(Float64, 8192),
3.0.*CUDA.ones(Float64, 8192))
y_in = hcat(1.9.*CUDA.rand(Float64, 8192) .+ 0.1,
0.1.*CUDA.ones(Float64, 8192),
2.0.*CUDA.ones(Float64, 8192))
OUT = CUDA.zeros(Float64, 8192, 8)

# McCormick objects for McCormick.jl timing
xMC = MC{2,NS}(1.0, Interval(0.5, 3.0), 1)
yMC = MC{2,NS}(0.7, Interval(0.1, 2.0), 2)

########################################################
##### Timing
# CPU: Intel i7-9850H
# GPU: NVIDIA Quadro T2000


##### McCormick.jl (single evaluation on CPU)
@btime exp($xMC/$yMC) - ($xMC*$yMC^2)/($yMC+1);
# 236.941 ns (0 allocations: 0 bytes)


##### SourceCodeMcCormick.jl (8192 simultaneous evaluations on GPU)
@btime new_func($OUT, $x_in, $y_in);
# 75.700 μs (18 allocations: 384 bytes)

# Average time per evaluation = 75.700 μs / 8192 = 9.241 ns


# Time to move results back to CPU (if needed):
@btime Array($OUT);
# 107.200 μs (8 allocations: 512.16 KiB)

# Time including evaluation:
# 75.700 μs + 107.200 μs = 182.900 μs
# Average time per evaluation = 182.900 μs / 8192 = 22.327 ns
```

As shown by the two previous examples, functions generated by `kgen` can be significantly faster than
the same calculations done by `McCormick.jl`, and provide the same information for a given input
expression. There is a considerable time penalty for bringing the results from device memory (GPU) back
to host memory (CPU) due to the volume of information generated. If possible, an algorithm designed
to make use of `SCMC`-generated functions should utilize calculated relaxation information within the
GPU rather than transfer the information between hardware components.


## Arguments for `kgen`

The `kgen` function can be called a `Num`-type expression as an input (as in the examples in the
previous section), or with several possible arguments that affect `kgen`'s behavior. Some of these
functionalities and their use cases are shown in this section.

### 1) Overwrite

Functions and kernels created by `kgen` are human-readable and are [currently] saved in the
"src\kernel_writer\storage" folder with a title created by hashing the input expression and
list of relevant variables. By saving the files in this way, calling `kgen` multiple times
with the same expression in the same Julia session can skip the steps of re-writing and
re-compilling the same expression. If the same expression is used across Julia sessions,
`kgen` can skip the re-writing step and instead compile the existing function and kernels.

In some cases, this behavior may not be desired. For example, if there is an error or interrupt
during the initial writing of the kernel, or if the source code of `kgen` is being adjusted
to modify how kernels are written, it is preferable to re-write the kernel from scratch
rather than rely on the existing generated code. For these cases, the setting `overwrite=true`
may be used to tell `kgen` to re-write and re-compile the generated code. For example:

```julia
using SourceCodeMcCormick, Symbolics, CUDA
Symbolics.@variables x, y
expr = exp(x/y) - (x*y^2)/(y+1)

# First call with an entirely new expression
@time new_func = kgen(expr);
# 3.958299 seconds (3.71 M allocations: 162.940 MiB, 13.60% compilation time)

# Second call in the same Julia session
@time new_func = kgen(expr);
# 0.000310 seconds (252 allocations: 10.289 KiB)

# Third call, with a hint for `kgen` to re-write and re-compile
@time new_func = kgen(expr, overwrite=true);
# 3.933316 seconds (3.70 M allocations: 162.619 MiB, 13.10% compilation time)
```


### 2) Problem Variables

When using `SCMC` to calculate relaxations for expressions that are components of a larger optimization
problem (such as individual constraints), it is necessary to provide `kgen` information about all of the
problem variables. For example, if the objective function in a problem is `x + y + z`, and one constraint
is `y^2 - z <= 0`, the subgradients of the relaxations for the constraint should be 3-dimensional despite
the constraint only depending on `y` and `z`. Further, the dimensions must match across expressions such
that the first subgradient dimension always corresponds to `x`, the second always corresponds to `y`, and
so on. This is critically important for the primary use case of `SCMC` and is built in to work without
the need to use a keyword argument. For example:

```julia
using SourceCodeMcCormick, Symbolics, CUDA
Symbolics.@variables x, y, z
obj = x + y + z
constr1 = y^2 - z

# A valid way to call `kgen`, which is incorrect in this case because the problem
# variables are {x, y, z}, but `constr1` only depends on {y, z}.
constr1_func_A = kgen(constr1)

# The more correct way to call `kgen` in this case, to inform it that the problem
# variables are {x, y, z}.
constr1_func_B = kgen(constr1, [x, y, z])
```

In this example, `constr1_func_A` assumes that the only variables are `y` and `z`, and it therefore expects
the `OUT` storage to be of size `(m,8)` (i.e., each subgradient is 2-dimensional). Because `kgen` is being
used for one constraint as part of an optimization problem, it is more correct to use `constr1_func_B`,
which expects `OUT` to be of size `(m,10)` (i.e., each subgradient is 3-dimensional). Note that this
is directly analogous to the typical use of `McCormick.jl`, where `MC` objects are constructed with the
larger problem in mind. E.g.:

```julia
using McCormick

xMC = MC{3,NS}(1.0, Interval(0.0, 2.0), 1)
yMC = MC{3,NS}(1.5, Interval(1.0, 3.0), 2)
zMC = MC{3,NS}(1.8, Interval(1.0, 4.0), 3)

obj = xMC + yMC + zMC
constr1 = yMC^2 - zMC
```

Here, for the user to calculate `constr1`, they must have already created the `MC` objects `yMC` and `zMC`.
In the construction of these objects, it was specified that the subgradient would be 3-dimensional (the
`3` in `MC{3,NS}`), and `yMC` and `zMC` were selected as the second and third dimensions of the subgradient
by the `2` and `3` that follow the call to `Interval()`. `SCMC` needs the same information; only the
format is different from `McCormick.jl`.

Note that if a list of problem variables is not provided, `SCMC` will collect the variables in the
expression it's provided and sort them alphabetically, and then numerically. I.e., `x1 < x2 < x10 < y1`.
If a variable list is provided, `SCMC` will respect the order of variables in the list without sorting
them. For example, if `constr1_func_B` were created by calling `kgen(constr1, [z, y, x])`, then
`constr1_func_B` would expect its input arguments to correspond to `(OUTPUT, z, y, x)` instead of
the alphabetical `(OUTPUT, x, y, z)`.


### 3) Splitting

Internally, `kgen` may split the input expression into multiple subexpressions if the generated kernels
would be too long. In general, longer kernels are faster but require longer compilation time. The default
settings attempt to provide a balance of good performance and acceptably low compilation times, but
situations may arise where a different balance is preferred. The amount of splitting can be modified by
setting the keyword argument `splitting` to one of `{:low, :default, :high, :max}`, where higher values
indicate the (internal) creation of more, shorter kernels (faster compilation, slower performance), and lower
values indicate the (internal) creation of fewer, longer kernels (longer compilation, faster performance).
Actual compilation time and performance, as well as the impact of different `splitting` settings, is
expression-dependent. In most cases, however, the default value should be sufficient.

Here's an example showing how different values affect compilation time and performance:
```julia
using SourceCodeMcCormick, Symbolics, CUDA
Symbolics.@variables x, y
expr = exp(x/y) - (x*y^2)/(y+1)

# CuArray values for SCMC timing
x_in = hcat(2.5.*CUDA.rand(Float64, 8192) .+ 0.5,
0.5.*CUDA.ones(Float64, 8192),
3.0.*CUDA.ones(Float64, 8192))
y_in = hcat(1.9.*CUDA.rand(Float64, 8192) .+ 0.1,
0.1.*CUDA.ones(Float64, 8192),
2.0.*CUDA.ones(Float64, 8192))
OUT = CUDA.zeros(Float64, 8192, 8)

##################################
# Default splitting (creates 1 kernel internally, based on the size of this expr)
@time new_func = kgen(expr, overwrite=true);
# 4.100798 seconds (3.70 M allocations: 162.637 MiB, 1.47% gc time, 12.29% compilation time)

@btime new_func($OUT, $x_in, $y_in);
# 77.000 μs (18 allocations: 384 bytes)

# --> This provides a good balance of compilation time and fast performance

##################################
# Higher splitting (creates 2 kernels internally)
@time new_func = kgen(expr, splitting=:high, overwrite=true);
# 3.452594 seconds (3.87 M allocations: 170.845 MiB, 13.87% compilation time)

@btime new_func($OUT, $x_in, $y_in);
# 117.000 μs (44 allocations: 960 bytes)

# --> Splitting the function into 2 shorter kernels yields a marginally faster
# compilation time, but performance is marginally worse due to needing twice
# as many calls to kernels

##################################
# Maximum splitting (creates 4 kernels internally)
@time new_func = kgen(expr, splitting=:max, overwrite=true);
# 5.903894 seconds (5.88 M allocations: 260.864 MiB, 12.91% compilation time)

@btime new_func($OUT, $x_in, $y_in);
# 184.900 μs (83 allocations: 1.91 KiB)

# --> Given the simplicity of the expression, splitting into 4 kernels does not
# provide benefit over the 2 kernels created with :high. Compilation time
# is longer because twice as many kernels need to be compiled (and each is
# not much less complicated than in the :high case), and performance is worse
# (because the number of kernel calls is doubled relative to the :high case
# without much decrease in complexity).
```

## The ParBB Algorithm

The intended use of `SCMC` is in conjunction with a branch-and-bound algorithm that is able
to make use of relaxations and subgradient information that are calculated in parallel on a
GPU. An implementation of such an algorithm is available for `SCMC` version 0.4, and is
under development for version 0.5. See the paper reference in the section "Citing SourceCodeMcCormick" for
a more complete description of the ParBB algorithm for version 0.4. Briefly, ParBB is built
as an extension of the EAGO solver, and it works by parallelizing the node procesing routines
in such a way that tasks may be performed by a GPU.


## Citing SourceCodeMcCormick

Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
```
Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic
global optimization with parallel architectures. Optimization Methods and Software, 1–39 (2024).
DOI: 10.1080/10556788.2024.2396297
```
A BibTeX entry is given below:
```bibtex
@Article{doi:10.1080/10556788.2024.2396297,
author = {Robert X. Gottlieb, Pengfei Xu, and Matthew D. Stuber},
journal = {Optimization Methods and Software},
title = {Automatic source code generation for deterministic global optimization with parallel architectures},
year = {2024},
pages = {1--39},
doi = {10.1080/10556788.2024.2396297},
eprint = {https://doi.org/10.1080/10556788.2024.2396297},
publisher = {Taylor \& Francis},
url = {https://doi.org/10.1080/10556788.2024.2396297},
}
```


# README for v0.4 (Deprecated)

This package uses source-code transformation to construct McCormick-based relaxations. Expressions composed
of `Symbolics.jl`-type variables can be passed into `SourceCodeMcCormick.jl` (`SCMC`) functions, after which
the expressions are factored, generalized McCormick relaxation rules and inclusion monotonic interval
Expand All @@ -18,7 +332,7 @@ numbers are recommended
for relaxations to maintain accuracy.


## Basic Functionality
## Basic Functionality (v0.4)

The primary user-facing function is `fgen` ("function generator"). The `fgen` function returns a source-code-generated
function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
Expand Down Expand Up @@ -143,7 +457,7 @@ SourceCodeMcCormick functionality that creates these functions which will even f
calculation speed.


## Arguments for `fgen`
## Arguments for `fgen` (v0.4)

The `fgen` function can be called with only a `Num`-type expression as an input (as in the examples in the
previous section), or with many other possible arguments that affect `fgen`'s behavior. Some of these
Expand Down Expand Up @@ -339,7 +653,7 @@ these outputs does not impact the speed of the generated functions, since the ir
aren't used in any calculations.


## The ParBB Algorithm
## The ParBB Algorithm (v0.4)

The intended use of SourceCodeMcCormick is in conjunction with a branch-and-bound algorithm
that is able to make use of relaxations and subgradient information that are calculated in
Expand Down Expand Up @@ -705,7 +1019,7 @@ that ParBB is able to make use of the faster computing hardware of GPUs, and eve
processed, can converge more quickly overall.


## Limitations
## Limitations (v0.4)

(See "Citing SourceCodeMcCormick" for limitations on a pre-subgradient version of SourceCodeMcCormick.)

Expand All @@ -725,7 +1039,7 @@ more frequently with respect to the bounds on its domain may perform worse than
where the structure of the relaxation is more consistent.


## Citing SourceCodeMcCormick
## Citing SourceCodeMcCormick (v0.3)
Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
```
Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic
Expand Down