Skip to content

Commit 1e88c00

Browse files
authored
Update README.md (#14)
1 parent 26b6be0 commit 1e88c00

File tree

1 file changed

+319
-5
lines changed

1 file changed

+319
-5
lines changed

README.md

Lines changed: 319 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,320 @@
55
| [![](https://img.shields.io/badge/Developed_by-PSOR_Lab-342674)](https://psor.uconn.edu/) | [![Build Status](https://github.com/PSORLab/SourceCodeMcCormick.jl/workflows/CI/badge.svg?branch=master)](https://github.com/PSORLab/SourceCodeMcCormick.jl/actions?query=workflow%3ACI) [![codecov](https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl)|
66

77

8+
This package uses source-code generation to create CUDA kernels that return evaluations of McCormick-based
9+
relaxations. Expressions composed of `Symbolics.jl`-type variables can be passed into the main `SourceCodeMcCormick.jl`
10+
(`SCMC`) function, `kgen`, after which the expressions are factored, algebraic rearrangements and other
11+
modifications are applied as necessary, and a new, custom function is written that calculates inclusion
12+
monotonic interval extensions, convex and concave relaxations, and subgradients of the convex and concave
13+
relaxations of the original expression. These new custom functions are meant to be used in, e.g., a branch-and-bound
14+
routine where the optimizer would like to calculate relaxations for many nodes simultaneously using GPU
15+
resources. They are designed to be used with `CUDA.CuArray{Float64}` objects, with 64-bit (double-precision)
16+
numbers recommended to maintain accuracy.
17+
18+
19+
## Basic Functionality
20+
21+
The primary user-facing function is `kgen` ("kernel generator"). The `kgen` function returns a source-code-generated
22+
function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
23+
convex relaxation subgradients, and concave relaxation subgradients, for an input symbolic expression. Inputs
24+
to the newly generated function are expected to be an output placeholder, followed by a separate `CuArray` for
25+
each input variable, sorted in alphabetical order. The "input" variables must be `(m,3)`-dimensional `CuArray`s,
26+
with the columns corresponding to points where evaluations are requested, lower bounds, and upper bounds,
27+
respectively. The "output" placeholder must be `(m,4+2n)`-dimensional, where `n` is the dimensionality of the
28+
expression. The columns will hold, respectively, the convex and concave relaxations, lower and upper bounds of
29+
an interval extension, a subgradient of the convex relaxation, and a subgradient of the concave relaxation.
30+
Each row `m` in the inputs and output are independent of one another, allowing calculations of relaxation information
31+
at multiple points on (potentially) different domains. A demonstration of how to use `kgen` is shown here,
32+
with the output compared to the multiple-dispatch-based `McCormick.jl`:
33+
34+
```julia
35+
using SourceCodeMcCormick, Symbolics, CUDA
36+
Symbolics.@variables x, y
37+
expr = exp(x/y) - (x*y^2)/(y+1)
38+
new_func = kgen(expr)
39+
x_in = CuArray([1.0 0.5 3.0;])
40+
y_in = CuArray([0.7 0.1 2.0;])
41+
OUT = CUDA.zeros(Float64, 1, 8)
42+
43+
using McCormick
44+
xMC = MC{2,NS}(1.0, Interval(0.5, 3.0), 1)
45+
yMC = MC{2,NS}(0.7, Interval(0.1, 2.0), 2)
46+
47+
48+
julia> new_func(OUT, x_in, y_in) # SourceCodeMcCormick.jl
49+
CUDA.HostKernel for f_3zSdMo7LFdg_1(CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1}, CuDeviceMatrix{Float64, 1})
50+
51+
julia> Array(OUT)
52+
1×8 Matrix{Float64}:
53+
0.228368 2.96348e12 -9.62507 1.06865e13 -2.32491 -3.62947 3.59209e12 -8.98023e11
54+
55+
julia> exp(xMC/yMC) - (xMC*yMC^2)/(yMC+1) # McCormick.jl
56+
MC{2, NS}(0.22836802303235793, 2.963476144457207e12, [-9.62507, 1.06865e+13], [-2.3249068975747305, -3.6294726270892967], [3.592092296310309e12, -8.980230740778097e11], false)
57+
```
58+
59+
In this example, the symbolic expression `exp(x/y) - (x*y^2)/(y+1)` is passed into `kgen`, which returns the
60+
function `new_func`. Passing inputs representing the values and domain where calculations are desired for `x`
61+
and `y` (i.e., evaluating `x` at `1.0` in the domain `[0.5, 3.0]` and `y` at `0.7` in the domain `[0.1, 2.0]`)
62+
into `new_func` returns pointwise evaluations of {cv, cc, lo, hi, [cvgrad]..., [ccgrad]...} for the original
63+
expression of `exp(x/y) - (x*y^2)/(y+1)` on the specified domain of `x` and `y`.
64+
65+
Evaluating large numbers of points simultaneously using a GPU allows for faster relaxation calculations than
66+
evaluating points individually using a CPU. This is demonstrated in the following example:
67+
68+
```julia
69+
using SourceCodeMcCormick, Symbolics, CUDA, McCormick, BenchmarkTools
70+
Symbolics.@variables x, y
71+
expr = exp(x/y) - (x*y^2)/(y+1)
72+
new_func = kgen(expr)
73+
74+
# CuArray values for SCMC timing
75+
x_in = hcat(2.5.*CUDA.rand(Float64, 8192) .+ 0.5,
76+
0.5.*CUDA.ones(Float64, 8192),
77+
3.0.*CUDA.ones(Float64, 8192))
78+
y_in = hcat(1.9.*CUDA.rand(Float64, 8192) .+ 0.1,
79+
0.1.*CUDA.ones(Float64, 8192),
80+
2.0.*CUDA.ones(Float64, 8192))
81+
OUT = CUDA.zeros(Float64, 8192, 8)
82+
83+
# McCormick objects for McCormick.jl timing
84+
xMC = MC{2,NS}(1.0, Interval(0.5, 3.0), 1)
85+
yMC = MC{2,NS}(0.7, Interval(0.1, 2.0), 2)
86+
87+
########################################################
88+
##### Timing
89+
# CPU: Intel i7-9850H
90+
# GPU: NVIDIA Quadro T2000
91+
92+
93+
##### McCormick.jl (single evaluation on CPU)
94+
@btime exp($xMC/$yMC) - ($xMC*$yMC^2)/($yMC+1);
95+
# 236.941 ns (0 allocations: 0 bytes)
96+
97+
98+
##### SourceCodeMcCormick.jl (8192 simultaneous evaluations on GPU)
99+
@btime new_func($OUT, $x_in, $y_in);
100+
# 75.700 μs (18 allocations: 384 bytes)
101+
102+
# Average time per evaluation = 75.700 μs / 8192 = 9.241 ns
103+
104+
105+
# Time to move results back to CPU (if needed):
106+
@btime Array($OUT);
107+
# 107.200 μs (8 allocations: 512.16 KiB)
108+
109+
# Time including evaluation:
110+
# 75.700 μs + 107.200 μs = 182.900 μs
111+
# Average time per evaluation = 182.900 μs / 8192 = 22.327 ns
112+
```
113+
114+
As shown by the two previous examples, functions generated by `kgen` can be significantly faster than
115+
the same calculations done by `McCormick.jl`, and provide the same information for a given input
116+
expression. There is a considerable time penalty for bringing the results from device memory (GPU) back
117+
to host memory (CPU) due to the volume of information generated. If possible, an algorithm designed
118+
to make use of `SCMC`-generated functions should utilize calculated relaxation information within the
119+
GPU rather than transfer the information between hardware components.
120+
121+
122+
## Arguments for `kgen`
123+
124+
The `kgen` function can be called a `Num`-type expression as an input (as in the examples in the
125+
previous section), or with several possible arguments that affect `kgen`'s behavior. Some of these
126+
functionalities and their use cases are shown in this section.
127+
128+
### 1) Overwrite
129+
130+
Functions and kernels created by `kgen` are human-readable and are [currently] saved in the
131+
"src\kernel_writer\storage" folder with a title created by hashing the input expression and
132+
list of relevant variables. By saving the files in this way, calling `kgen` multiple times
133+
with the same expression in the same Julia session can skip the steps of re-writing and
134+
re-compilling the same expression. If the same expression is used across Julia sessions,
135+
`kgen` can skip the re-writing step and instead compile the existing function and kernels.
136+
137+
In some cases, this behavior may not be desired. For example, if there is an error or interrupt
138+
during the initial writing of the kernel, or if the source code of `kgen` is being adjusted
139+
to modify how kernels are written, it is preferable to re-write the kernel from scratch
140+
rather than rely on the existing generated code. For these cases, the setting `overwrite=true`
141+
may be used to tell `kgen` to re-write and re-compile the generated code. For example:
142+
143+
```julia
144+
using SourceCodeMcCormick, Symbolics, CUDA
145+
Symbolics.@variables x, y
146+
expr = exp(x/y) - (x*y^2)/(y+1)
147+
148+
# First call with an entirely new expression
149+
@time new_func = kgen(expr);
150+
# 3.958299 seconds (3.71 M allocations: 162.940 MiB, 13.60% compilation time)
151+
152+
# Second call in the same Julia session
153+
@time new_func = kgen(expr);
154+
# 0.000310 seconds (252 allocations: 10.289 KiB)
155+
156+
# Third call, with a hint for `kgen` to re-write and re-compile
157+
@time new_func = kgen(expr, overwrite=true);
158+
# 3.933316 seconds (3.70 M allocations: 162.619 MiB, 13.10% compilation time)
159+
```
160+
161+
162+
### 2) Problem Variables
163+
164+
When using `SCMC` to calculate relaxations for expressions that are components of a larger optimization
165+
problem (such as individual constraints), it is necessary to provide `kgen` information about all of the
166+
problem variables. For example, if the objective function in a problem is `x + y + z`, and one constraint
167+
is `y^2 - z <= 0`, the subgradients of the relaxations for the constraint should be 3-dimensional despite
168+
the constraint only depending on `y` and `z`. Further, the dimensions must match across expressions such
169+
that the first subgradient dimension always corresponds to `x`, the second always corresponds to `y`, and
170+
so on. This is critically important for the primary use case of `SCMC` and is built in to work without
171+
the need to use a keyword argument. For example:
172+
173+
```julia
174+
using SourceCodeMcCormick, Symbolics, CUDA
175+
Symbolics.@variables x, y, z
176+
obj = x + y + z
177+
constr1 = y^2 - z
178+
179+
# A valid way to call `kgen`, which is incorrect in this case because the problem
180+
# variables are {x, y, z}, but `constr1` only depends on {y, z}.
181+
constr1_func_A = kgen(constr1)
182+
183+
# The more correct way to call `kgen` in this case, to inform it that the problem
184+
# variables are {x, y, z}.
185+
constr1_func_B = kgen(constr1, [x, y, z])
186+
```
187+
188+
In this example, `constr1_func_A` assumes that the only variables are `y` and `z`, and it therefore expects
189+
the `OUT` storage to be of size `(m,8)` (i.e., each subgradient is 2-dimensional). Because `kgen` is being
190+
used for one constraint as part of an optimization problem, it is more correct to use `constr1_func_B`,
191+
which expects `OUT` to be of size `(m,10)` (i.e., each subgradient is 3-dimensional). Note that this
192+
is directly analogous to the typical use of `McCormick.jl`, where `MC` objects are constructed with the
193+
larger problem in mind. E.g.:
194+
195+
```julia
196+
using McCormick
197+
198+
xMC = MC{3,NS}(1.0, Interval(0.0, 2.0), 1)
199+
yMC = MC{3,NS}(1.5, Interval(1.0, 3.0), 2)
200+
zMC = MC{3,NS}(1.8, Interval(1.0, 4.0), 3)
201+
202+
obj = xMC + yMC + zMC
203+
constr1 = yMC^2 - zMC
204+
```
205+
206+
Here, for the user to calculate `constr1`, they must have already created the `MC` objects `yMC` and `zMC`.
207+
In the construction of these objects, it was specified that the subgradient would be 3-dimensional (the
208+
`3` in `MC{3,NS}`), and `yMC` and `zMC` were selected as the second and third dimensions of the subgradient
209+
by the `2` and `3` that follow the call to `Interval()`. `SCMC` needs the same information; only the
210+
format is different from `McCormick.jl`.
211+
212+
Note that if a list of problem variables is not provided, `SCMC` will collect the variables in the
213+
expression it's provided and sort them alphabetically, and then numerically. I.e., `x1 < x2 < x10 < y1`.
214+
If a variable list is provided, `SCMC` will respect the order of variables in the list without sorting
215+
them. For example, if `constr1_func_B` were created by calling `kgen(constr1, [z, y, x])`, then
216+
`constr1_func_B` would expect its input arguments to correspond to `(OUTPUT, z, y, x)` instead of
217+
the alphabetical `(OUTPUT, x, y, z)`.
218+
219+
220+
### 3) Splitting
221+
222+
Internally, `kgen` may split the input expression into multiple subexpressions if the generated kernels
223+
would be too long. In general, longer kernels are faster but require longer compilation time. The default
224+
settings attempt to provide a balance of good performance and acceptably low compilation times, but
225+
situations may arise where a different balance is preferred. The amount of splitting can be modified by
226+
setting the keyword argument `splitting` to one of `{:low, :default, :high, :max}`, where higher values
227+
indicate the (internal) creation of more, shorter kernels (faster compilation, slower performance), and lower
228+
values indicate the (internal) creation of fewer, longer kernels (longer compilation, faster performance).
229+
Actual compilation time and performance, as well as the impact of different `splitting` settings, is
230+
expression-dependent. In most cases, however, the default value should be sufficient.
231+
232+
Here's an example showing how different values affect compilation time and performance:
233+
```julia
234+
using SourceCodeMcCormick, Symbolics, CUDA
235+
Symbolics.@variables x, y
236+
expr = exp(x/y) - (x*y^2)/(y+1)
237+
238+
# CuArray values for SCMC timing
239+
x_in = hcat(2.5.*CUDA.rand(Float64, 8192) .+ 0.5,
240+
0.5.*CUDA.ones(Float64, 8192),
241+
3.0.*CUDA.ones(Float64, 8192))
242+
y_in = hcat(1.9.*CUDA.rand(Float64, 8192) .+ 0.1,
243+
0.1.*CUDA.ones(Float64, 8192),
244+
2.0.*CUDA.ones(Float64, 8192))
245+
OUT = CUDA.zeros(Float64, 8192, 8)
246+
247+
##################################
248+
# Default splitting (creates 1 kernel internally, based on the size of this expr)
249+
@time new_func = kgen(expr, overwrite=true);
250+
# 4.100798 seconds (3.70 M allocations: 162.637 MiB, 1.47% gc time, 12.29% compilation time)
251+
252+
@btime new_func($OUT, $x_in, $y_in);
253+
# 77.000 μs (18 allocations: 384 bytes)
254+
255+
# --> This provides a good balance of compilation time and fast performance
256+
257+
##################################
258+
# Higher splitting (creates 2 kernels internally)
259+
@time new_func = kgen(expr, splitting=:high, overwrite=true);
260+
# 3.452594 seconds (3.87 M allocations: 170.845 MiB, 13.87% compilation time)
261+
262+
@btime new_func($OUT, $x_in, $y_in);
263+
# 117.000 μs (44 allocations: 960 bytes)
264+
265+
# --> Splitting the function into 2 shorter kernels yields a marginally faster
266+
# compilation time, but performance is marginally worse due to needing twice
267+
# as many calls to kernels
268+
269+
##################################
270+
# Maximum splitting (creates 4 kernels internally)
271+
@time new_func = kgen(expr, splitting=:max, overwrite=true);
272+
# 5.903894 seconds (5.88 M allocations: 260.864 MiB, 12.91% compilation time)
273+
274+
@btime new_func($OUT, $x_in, $y_in);
275+
# 184.900 μs (83 allocations: 1.91 KiB)
276+
277+
# --> Given the simplicity of the expression, splitting into 4 kernels does not
278+
# provide benefit over the 2 kernels created with :high. Compilation time
279+
# is longer because twice as many kernels need to be compiled (and each is
280+
# not much less complicated than in the :high case), and performance is worse
281+
# (because the number of kernel calls is doubled relative to the :high case
282+
# without much decrease in complexity).
283+
```
284+
285+
## The ParBB Algorithm
286+
287+
The intended use of `SCMC` is in conjunction with a branch-and-bound algorithm that is able
288+
to make use of relaxations and subgradient information that are calculated in parallel on a
289+
GPU. An implementation of such an algorithm is available for `SCMC` version 0.4, and is
290+
under development for version 0.5. See the paper reference in the section "Citing SourceCodeMcCormick" for
291+
a more complete description of the ParBB algorithm for version 0.4. Briefly, ParBB is built
292+
as an extension of the EAGO solver, and it works by parallelizing the node procesing routines
293+
in such a way that tasks may be performed by a GPU.
294+
295+
296+
## Citing SourceCodeMcCormick
297+
298+
Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
299+
```
300+
Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic
301+
global optimization with parallel architectures. Optimization Methods and Software, 1–39 (2024).
302+
DOI: 10.1080/10556788.2024.2396297
303+
```
304+
A BibTeX entry is given below:
305+
```bibtex
306+
@Article{doi:10.1080/10556788.2024.2396297,
307+
author = {Robert X. Gottlieb, Pengfei Xu, and Matthew D. Stuber},
308+
journal = {Optimization Methods and Software},
309+
title = {Automatic source code generation for deterministic global optimization with parallel architectures},
310+
year = {2024},
311+
pages = {1--39},
312+
doi = {10.1080/10556788.2024.2396297},
313+
eprint = {https://doi.org/10.1080/10556788.2024.2396297},
314+
publisher = {Taylor \& Francis},
315+
url = {https://doi.org/10.1080/10556788.2024.2396297},
316+
}
317+
```
318+
319+
320+
# README for v0.4 (Deprecated)
321+
8322
This package uses source-code transformation to construct McCormick-based relaxations. Expressions composed
9323
of `Symbolics.jl`-type variables can be passed into `SourceCodeMcCormick.jl` (`SCMC`) functions, after which
10324
the expressions are factored, generalized McCormick relaxation rules and inclusion monotonic interval
@@ -18,7 +332,7 @@ numbers are recommended
18332
for relaxations to maintain accuracy.
19333

20334

21-
## Basic Functionality
335+
## Basic Functionality (v0.4)
22336

23337
The primary user-facing function is `fgen` ("function generator"). The `fgen` function returns a source-code-generated
24338
function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
@@ -143,7 +457,7 @@ SourceCodeMcCormick functionality that creates these functions which will even f
143457
calculation speed.
144458

145459

146-
## Arguments for `fgen`
460+
## Arguments for `fgen` (v0.4)
147461

148462
The `fgen` function can be called with only a `Num`-type expression as an input (as in the examples in the
149463
previous section), or with many other possible arguments that affect `fgen`'s behavior. Some of these
@@ -339,7 +653,7 @@ these outputs does not impact the speed of the generated functions, since the ir
339653
aren't used in any calculations.
340654

341655

342-
## The ParBB Algorithm
656+
## The ParBB Algorithm (v0.4)
343657

344658
The intended use of SourceCodeMcCormick is in conjunction with a branch-and-bound algorithm
345659
that is able to make use of relaxations and subgradient information that are calculated in
@@ -705,7 +1019,7 @@ that ParBB is able to make use of the faster computing hardware of GPUs, and eve
7051019
processed, can converge more quickly overall.
7061020

7071021

708-
## Limitations
1022+
## Limitations (v0.4)
7091023

7101024
(See "Citing SourceCodeMcCormick" for limitations on a pre-subgradient version of SourceCodeMcCormick.)
7111025

@@ -725,7 +1039,7 @@ more frequently with respect to the bounds on its domain may perform worse than
7251039
where the structure of the relaxation is more consistent.
7261040

7271041

728-
## Citing SourceCodeMcCormick
1042+
## Citing SourceCodeMcCormick (v0.3)
7291043
Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
7301044
```
7311045
Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic

0 commit comments

Comments
 (0)