5
5
| [ ![ ] ( https://img.shields.io/badge/Developed_by-PSOR_Lab-342674 )] ( https://psor.uconn.edu/ ) | [ ![ Build Status] ( https://github.com/PSORLab/SourceCodeMcCormick.jl/workflows/CI/badge.svg?branch=master )] ( https://github.com/PSORLab/SourceCodeMcCormick.jl/actions?query=workflow%3ACI ) [ ![ codecov] ( https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl/branch/master/graph/badge.svg )] ( https://codecov.io/gh/PSORLab/SourceCodeMcCormick.jl ) |
6
6
7
7
8
+ This package uses source-code generation to create CUDA kernels that return evaluations of McCormick-based
9
+ relaxations. Expressions composed of ` Symbolics.jl ` -type variables can be passed into the main ` SourceCodeMcCormick.jl `
10
+ (` SCMC ` ) function, ` kgen ` , after which the expressions are factored, algebraic rearrangements and other
11
+ modifications are applied as necessary, and a new, custom function is written that calculates inclusion
12
+ monotonic interval extensions, convex and concave relaxations, and subgradients of the convex and concave
13
+ relaxations of the original expression. These new custom functions are meant to be used in, e.g., a branch-and-bound
14
+ routine where the optimizer would like to calculate relaxations for many nodes simultaneously using GPU
15
+ resources. They are designed to be used with ` CUDA.CuArray{Float64} ` objects, with 64-bit (double-precision)
16
+ numbers recommended to maintain accuracy.
17
+
18
+
19
+ ## Basic Functionality
20
+
21
+ The primary user-facing function is ` kgen ` ("kernel generator"). The ` kgen ` function returns a source-code-generated
22
+ function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
23
+ convex relaxation subgradients, and concave relaxation subgradients, for an input symbolic expression. Inputs
24
+ to the newly generated function are expected to be an output placeholder, followed by a separate ` CuArray ` for
25
+ each input variable, sorted in alphabetical order. The "input" variables must be ` (m,3) ` -dimensional ` CuArray ` s,
26
+ with the columns corresponding to points where evaluations are requested, lower bounds, and upper bounds,
27
+ respectively. The "output" placeholder must be ` (m,4+2n) ` -dimensional, where ` n ` is the dimensionality of the
28
+ expression. The columns will hold, respectively, the convex and concave relaxations, lower and upper bounds of
29
+ an interval extension, a subgradient of the convex relaxation, and a subgradient of the concave relaxation.
30
+ Each row ` m ` in the inputs and output are independent of one another, allowing calculations of relaxation information
31
+ at multiple points on (potentially) different domains. A demonstration of how to use ` kgen ` is shown here,
32
+ with the output compared to the multiple-dispatch-based ` McCormick.jl ` :
33
+
34
+ ``` julia
35
+ using SourceCodeMcCormick, Symbolics, CUDA
36
+ Symbolics. @variables x, y
37
+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
38
+ new_func = kgen (expr)
39
+ x_in = CuArray ([1.0 0.5 3.0 ;])
40
+ y_in = CuArray ([0.7 0.1 2.0 ;])
41
+ OUT = CUDA. zeros (Float64, 1 , 8 )
42
+
43
+ using McCormick
44
+ xMC = MC {2,NS} (1.0 , Interval (0.5 , 3.0 ), 1 )
45
+ yMC = MC {2,NS} (0.7 , Interval (0.1 , 2.0 ), 2 )
46
+
47
+
48
+ julia> new_func (OUT, x_in, y_in) # SourceCodeMcCormick.jl
49
+ CUDA. HostKernel for f_3zSdMo7LFdg_1 (CuDeviceMatrix{Float64, 1 }, CuDeviceMatrix{Float64, 1 }, CuDeviceMatrix{Float64, 1 })
50
+
51
+ julia> Array (OUT)
52
+ 1 × 8 Matrix{Float64}:
53
+ 0.228368 2.96348e12 - 9.62507 1.06865e13 - 2.32491 - 3.62947 3.59209e12 - 8.98023e11
54
+
55
+ julia> exp (xMC/ yMC) - (xMC* yMC^ 2 )/ (yMC+ 1 ) # McCormick.jl
56
+ MC {2, NS} (0.22836802303235793 , 2.963476144457207e12 , [- 9.62507 , 1.06865e+13 ], [- 2.3249068975747305 , - 3.6294726270892967 ], [3.592092296310309e12 , - 8.980230740778097e11 ], false )
57
+ ```
58
+
59
+ In this example, the symbolic expression ` exp(x/y) - (x*y^2)/(y+1) ` is passed into ` kgen ` , which returns the
60
+ function ` new_func ` . Passing inputs representing the values and domain where calculations are desired for ` x `
61
+ and ` y ` (i.e., evaluating ` x ` at ` 1.0 ` in the domain ` [0.5, 3.0] ` and ` y ` at ` 0.7 ` in the domain ` [0.1, 2.0] ` )
62
+ into ` new_func ` returns pointwise evaluations of {cv, cc, lo, hi, [ cvgrad] ..., [ ccgrad] ...} for the original
63
+ expression of ` exp(x/y) - (x*y^2)/(y+1) ` on the specified domain of ` x ` and ` y ` .
64
+
65
+ Evaluating large numbers of points simultaneously using a GPU allows for faster relaxation calculations than
66
+ evaluating points individually using a CPU. This is demonstrated in the following example:
67
+
68
+ ``` julia
69
+ using SourceCodeMcCormick, Symbolics, CUDA, McCormick, BenchmarkTools
70
+ Symbolics. @variables x, y
71
+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
72
+ new_func = kgen (expr)
73
+
74
+ # CuArray values for SCMC timing
75
+ x_in = hcat (2.5 .* CUDA. rand (Float64, 8192 ) .+ 0.5 ,
76
+ 0.5 .* CUDA. ones (Float64, 8192 ),
77
+ 3.0 .* CUDA. ones (Float64, 8192 ))
78
+ y_in = hcat (1.9 .* CUDA. rand (Float64, 8192 ) .+ 0.1 ,
79
+ 0.1 .* CUDA. ones (Float64, 8192 ),
80
+ 2.0 .* CUDA. ones (Float64, 8192 ))
81
+ OUT = CUDA. zeros (Float64, 8192 , 8 )
82
+
83
+ # McCormick objects for McCormick.jl timing
84
+ xMC = MC {2,NS} (1.0 , Interval (0.5 , 3.0 ), 1 )
85
+ yMC = MC {2,NS} (0.7 , Interval (0.1 , 2.0 ), 2 )
86
+
87
+ # #######################################################
88
+ # #### Timing
89
+ # CPU: Intel i7-9850H
90
+ # GPU: NVIDIA Quadro T2000
91
+
92
+
93
+ # #### McCormick.jl (single evaluation on CPU)
94
+ @btime exp ($ xMC/ $ yMC) - ($ xMC* $ yMC^ 2 )/ ($ yMC+ 1 );
95
+ # 236.941 ns (0 allocations: 0 bytes)
96
+
97
+
98
+ # #### SourceCodeMcCormick.jl (8192 simultaneous evaluations on GPU)
99
+ @btime new_func ($ OUT, $ x_in, $ y_in);
100
+ # 75.700 μs (18 allocations: 384 bytes)
101
+
102
+ # Average time per evaluation = 75.700 μs / 8192 = 9.241 ns
103
+
104
+
105
+ # Time to move results back to CPU (if needed):
106
+ @btime Array ($ OUT);
107
+ # 107.200 μs (8 allocations: 512.16 KiB)
108
+
109
+ # Time including evaluation:
110
+ # 75.700 μs + 107.200 μs = 182.900 μs
111
+ # Average time per evaluation = 182.900 μs / 8192 = 22.327 ns
112
+ ```
113
+
114
+ As shown by the two previous examples, functions generated by ` kgen ` can be significantly faster than
115
+ the same calculations done by ` McCormick.jl ` , and provide the same information for a given input
116
+ expression. There is a considerable time penalty for bringing the results from device memory (GPU) back
117
+ to host memory (CPU) due to the volume of information generated. If possible, an algorithm designed
118
+ to make use of ` SCMC ` -generated functions should utilize calculated relaxation information within the
119
+ GPU rather than transfer the information between hardware components.
120
+
121
+
122
+ ## Arguments for ` kgen `
123
+
124
+ The ` kgen ` function can be called a ` Num ` -type expression as an input (as in the examples in the
125
+ previous section), or with several possible arguments that affect ` kgen ` 's behavior. Some of these
126
+ functionalities and their use cases are shown in this section.
127
+
128
+ ### 1) Overwrite
129
+
130
+ Functions and kernels created by ` kgen ` are human-readable and are [ currently] saved in the
131
+ "src\kernel_writer\storage" folder with a title created by hashing the input expression and
132
+ list of relevant variables. By saving the files in this way, calling ` kgen ` multiple times
133
+ with the same expression in the same Julia session can skip the steps of re-writing and
134
+ re-compilling the same expression. If the same expression is used across Julia sessions,
135
+ ` kgen ` can skip the re-writing step and instead compile the existing function and kernels.
136
+
137
+ In some cases, this behavior may not be desired. For example, if there is an error or interrupt
138
+ during the initial writing of the kernel, or if the source code of ` kgen ` is being adjusted
139
+ to modify how kernels are written, it is preferable to re-write the kernel from scratch
140
+ rather than rely on the existing generated code. For these cases, the setting ` overwrite=true `
141
+ may be used to tell ` kgen ` to re-write and re-compile the generated code. For example:
142
+
143
+ ``` julia
144
+ using SourceCodeMcCormick, Symbolics, CUDA
145
+ Symbolics. @variables x, y
146
+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
147
+
148
+ # First call with an entirely new expression
149
+ @time new_func = kgen (expr);
150
+ # 3.958299 seconds (3.71 M allocations: 162.940 MiB, 13.60% compilation time)
151
+
152
+ # Second call in the same Julia session
153
+ @time new_func = kgen (expr);
154
+ # 0.000310 seconds (252 allocations: 10.289 KiB)
155
+
156
+ # Third call, with a hint for `kgen` to re-write and re-compile
157
+ @time new_func = kgen (expr, overwrite= true );
158
+ # 3.933316 seconds (3.70 M allocations: 162.619 MiB, 13.10% compilation time)
159
+ ```
160
+
161
+
162
+ ### 2) Problem Variables
163
+
164
+ When using ` SCMC ` to calculate relaxations for expressions that are components of a larger optimization
165
+ problem (such as individual constraints), it is necessary to provide ` kgen ` information about all of the
166
+ problem variables. For example, if the objective function in a problem is ` x + y + z ` , and one constraint
167
+ is ` y^2 - z <= 0 ` , the subgradients of the relaxations for the constraint should be 3-dimensional despite
168
+ the constraint only depending on ` y ` and ` z ` . Further, the dimensions must match across expressions such
169
+ that the first subgradient dimension always corresponds to ` x ` , the second always corresponds to ` y ` , and
170
+ so on. This is critically important for the primary use case of ` SCMC ` and is built in to work without
171
+ the need to use a keyword argument. For example:
172
+
173
+ ``` julia
174
+ using SourceCodeMcCormick, Symbolics, CUDA
175
+ Symbolics. @variables x, y, z
176
+ obj = x + y + z
177
+ constr1 = y^ 2 - z
178
+
179
+ # A valid way to call `kgen`, which is incorrect in this case because the problem
180
+ # variables are {x, y, z}, but `constr1` only depends on {y, z}.
181
+ constr1_func_A = kgen (constr1)
182
+
183
+ # The more correct way to call `kgen` in this case, to inform it that the problem
184
+ # variables are {x, y, z}.
185
+ constr1_func_B = kgen (constr1, [x, y, z])
186
+ ```
187
+
188
+ In this example, ` constr1_func_A ` assumes that the only variables are ` y ` and ` z ` , and it therefore expects
189
+ the ` OUT ` storage to be of size ` (m,8) ` (i.e., each subgradient is 2-dimensional). Because ` kgen ` is being
190
+ used for one constraint as part of an optimization problem, it is more correct to use ` constr1_func_B ` ,
191
+ which expects ` OUT ` to be of size ` (m,10) ` (i.e., each subgradient is 3-dimensional). Note that this
192
+ is directly analogous to the typical use of ` McCormick.jl ` , where ` MC ` objects are constructed with the
193
+ larger problem in mind. E.g.:
194
+
195
+ ``` julia
196
+ using McCormick
197
+
198
+ xMC = MC {3,NS} (1.0 , Interval (0.0 , 2.0 ), 1 )
199
+ yMC = MC {3,NS} (1.5 , Interval (1.0 , 3.0 ), 2 )
200
+ zMC = MC {3,NS} (1.8 , Interval (1.0 , 4.0 ), 3 )
201
+
202
+ obj = xMC + yMC + zMC
203
+ constr1 = yMC^ 2 - zMC
204
+ ```
205
+
206
+ Here, for the user to calculate ` constr1 ` , they must have already created the ` MC ` objects ` yMC ` and ` zMC ` .
207
+ In the construction of these objects, it was specified that the subgradient would be 3-dimensional (the
208
+ ` 3 ` in ` MC{3,NS} ` ), and ` yMC ` and ` zMC ` were selected as the second and third dimensions of the subgradient
209
+ by the ` 2 ` and ` 3 ` that follow the call to ` Interval() ` . ` SCMC ` needs the same information; only the
210
+ format is different from ` McCormick.jl ` .
211
+
212
+ Note that if a list of problem variables is not provided, ` SCMC ` will collect the variables in the
213
+ expression it's provided and sort them alphabetically, and then numerically. I.e., ` x1 < x2 < x10 < y1 ` .
214
+ If a variable list is provided, ` SCMC ` will respect the order of variables in the list without sorting
215
+ them. For example, if ` constr1_func_B ` were created by calling ` kgen(constr1, [z, y, x]) ` , then
216
+ ` constr1_func_B ` would expect its input arguments to correspond to ` (OUTPUT, z, y, x) ` instead of
217
+ the alphabetical ` (OUTPUT, x, y, z) ` .
218
+
219
+
220
+ ### 3) Splitting
221
+
222
+ Internally, ` kgen ` may split the input expression into multiple subexpressions if the generated kernels
223
+ would be too long. In general, longer kernels are faster but require longer compilation time. The default
224
+ settings attempt to provide a balance of good performance and acceptably low compilation times, but
225
+ situations may arise where a different balance is preferred. The amount of splitting can be modified by
226
+ setting the keyword argument ` splitting ` to one of ` {:low, :default, :high, :max} ` , where higher values
227
+ indicate the (internal) creation of more, shorter kernels (faster compilation, slower performance), and lower
228
+ values indicate the (internal) creation of fewer, longer kernels (longer compilation, faster performance).
229
+ Actual compilation time and performance, as well as the impact of different ` splitting ` settings, is
230
+ expression-dependent. In most cases, however, the default value should be sufficient.
231
+
232
+ Here's an example showing how different values affect compilation time and performance:
233
+ ``` julia
234
+ using SourceCodeMcCormick, Symbolics, CUDA
235
+ Symbolics. @variables x, y
236
+ expr = exp (x/ y) - (x* y^ 2 )/ (y+ 1 )
237
+
238
+ # CuArray values for SCMC timing
239
+ x_in = hcat (2.5 .* CUDA. rand (Float64, 8192 ) .+ 0.5 ,
240
+ 0.5 .* CUDA. ones (Float64, 8192 ),
241
+ 3.0 .* CUDA. ones (Float64, 8192 ))
242
+ y_in = hcat (1.9 .* CUDA. rand (Float64, 8192 ) .+ 0.1 ,
243
+ 0.1 .* CUDA. ones (Float64, 8192 ),
244
+ 2.0 .* CUDA. ones (Float64, 8192 ))
245
+ OUT = CUDA. zeros (Float64, 8192 , 8 )
246
+
247
+ # #################################
248
+ # Default splitting (creates 1 kernel internally, based on the size of this expr)
249
+ @time new_func = kgen (expr, overwrite= true );
250
+ # 4.100798 seconds (3.70 M allocations: 162.637 MiB, 1.47% gc time, 12.29% compilation time)
251
+
252
+ @btime new_func ($ OUT, $ x_in, $ y_in);
253
+ # 77.000 μs (18 allocations: 384 bytes)
254
+
255
+ # --> This provides a good balance of compilation time and fast performance
256
+
257
+ # #################################
258
+ # Higher splitting (creates 2 kernels internally)
259
+ @time new_func = kgen (expr, splitting= :high , overwrite= true );
260
+ # 3.452594 seconds (3.87 M allocations: 170.845 MiB, 13.87% compilation time)
261
+
262
+ @btime new_func ($ OUT, $ x_in, $ y_in);
263
+ # 117.000 μs (44 allocations: 960 bytes)
264
+
265
+ # --> Splitting the function into 2 shorter kernels yields a marginally faster
266
+ # compilation time, but performance is marginally worse due to needing twice
267
+ # as many calls to kernels
268
+
269
+ # #################################
270
+ # Maximum splitting (creates 4 kernels internally)
271
+ @time new_func = kgen (expr, splitting= :max , overwrite= true );
272
+ # 5.903894 seconds (5.88 M allocations: 260.864 MiB, 12.91% compilation time)
273
+
274
+ @btime new_func ($ OUT, $ x_in, $ y_in);
275
+ # 184.900 μs (83 allocations: 1.91 KiB)
276
+
277
+ # --> Given the simplicity of the expression, splitting into 4 kernels does not
278
+ # provide benefit over the 2 kernels created with :high. Compilation time
279
+ # is longer because twice as many kernels need to be compiled (and each is
280
+ # not much less complicated than in the :high case), and performance is worse
281
+ # (because the number of kernel calls is doubled relative to the :high case
282
+ # without much decrease in complexity).
283
+ ```
284
+
285
+ ## The ParBB Algorithm
286
+
287
+ The intended use of ` SCMC ` is in conjunction with a branch-and-bound algorithm that is able
288
+ to make use of relaxations and subgradient information that are calculated in parallel on a
289
+ GPU. An implementation of such an algorithm is available for ` SCMC ` version 0.4, and is
290
+ under development for version 0.5. See the paper reference in the section "Citing SourceCodeMcCormick" for
291
+ a more complete description of the ParBB algorithm for version 0.4. Briefly, ParBB is built
292
+ as an extension of the EAGO solver, and it works by parallelizing the node procesing routines
293
+ in such a way that tasks may be performed by a GPU.
294
+
295
+
296
+ ## Citing SourceCodeMcCormick
297
+
298
+ Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
299
+ ```
300
+ Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic
301
+ global optimization with parallel architectures. Optimization Methods and Software, 1–39 (2024).
302
+ DOI: 10.1080/10556788.2024.2396297
303
+ ```
304
+ A BibTeX entry is given below:
305
+ ``` bibtex
306
+ @Article{doi:10.1080/10556788.2024.2396297,
307
+ author = {Robert X. Gottlieb, Pengfei Xu, and Matthew D. Stuber},
308
+ journal = {Optimization Methods and Software},
309
+ title = {Automatic source code generation for deterministic global optimization with parallel architectures},
310
+ year = {2024},
311
+ pages = {1--39},
312
+ doi = {10.1080/10556788.2024.2396297},
313
+ eprint = {https://doi.org/10.1080/10556788.2024.2396297},
314
+ publisher = {Taylor \& Francis},
315
+ url = {https://doi.org/10.1080/10556788.2024.2396297},
316
+ }
317
+ ```
318
+
319
+
320
+ # README for v0.4 (Deprecated)
321
+
8
322
This package uses source-code transformation to construct McCormick-based relaxations. Expressions composed
9
323
of ` Symbolics.jl ` -type variables can be passed into ` SourceCodeMcCormick.jl ` (` SCMC ` ) functions, after which
10
324
the expressions are factored, generalized McCormick relaxation rules and inclusion monotonic interval
@@ -18,7 +332,7 @@ numbers are recommended
18
332
for relaxations to maintain accuracy.
19
333
20
334
21
- ## Basic Functionality
335
+ ## Basic Functionality (v0.4)
22
336
23
337
The primary user-facing function is ` fgen ` ("function generator"). The ` fgen ` function returns a source-code-generated
24
338
function that provides evaluations of convex and concave relaxations, inclusion monotonic interval extensions,
@@ -143,7 +457,7 @@ SourceCodeMcCormick functionality that creates these functions which will even f
143
457
calculation speed.
144
458
145
459
146
- ## Arguments for ` fgen `
460
+ ## Arguments for ` fgen ` (v0.4)
147
461
148
462
The ` fgen ` function can be called with only a ` Num ` -type expression as an input (as in the examples in the
149
463
previous section), or with many other possible arguments that affect ` fgen ` 's behavior. Some of these
@@ -339,7 +653,7 @@ these outputs does not impact the speed of the generated functions, since the ir
339
653
aren't used in any calculations.
340
654
341
655
342
- ## The ParBB Algorithm
656
+ ## The ParBB Algorithm (v0.4)
343
657
344
658
The intended use of SourceCodeMcCormick is in conjunction with a branch-and-bound algorithm
345
659
that is able to make use of relaxations and subgradient information that are calculated in
@@ -705,7 +1019,7 @@ that ParBB is able to make use of the faster computing hardware of GPUs, and eve
705
1019
processed, can converge more quickly overall.
706
1020
707
1021
708
- ## Limitations
1022
+ ## Limitations (v0.4)
709
1023
710
1024
(See "Citing SourceCodeMcCormick" for limitations on a pre-subgradient version of SourceCodeMcCormick.)
711
1025
@@ -725,7 +1039,7 @@ more frequently with respect to the bounds on its domain may perform worse than
725
1039
where the structure of the relaxation is more consistent.
726
1040
727
1041
728
- ## Citing SourceCodeMcCormick
1042
+ ## Citing SourceCodeMcCormick (v0.3)
729
1043
Please cite the following paper when using SourceCodeMcCormick.jl. In plain text form this is:
730
1044
```
731
1045
Gottlieb, R. X., Xu, P., and Stuber, M. D. Automatic source code generation for deterministic
0 commit comments