Skip to content

Commit dbbfdb8

Browse files
author
Jessica Shi
committed
update scheduling documentation
1 parent c2c8d10 commit dbbfdb8

File tree

3 files changed

+112
-21
lines changed

3 files changed

+112
-21
lines changed

codegen.html

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -132,9 +132,14 @@ <h6 style="margin-bottom: 0px; margin-top: 18px">Input a tensor algebra expressi
132132
<div class="mdl-grid" style="padding-top: 6px">
133133
<div class="mdl-layout-spacer"></div>
134134
<div class="mdl-cell mdl-cell--9-col">
135-
<button id="btnSchedule" class="mdl-button mdl-js-button mdl-button--raised mdl-js-ripple-effect demo-btn" style="margin-bottom:10px; width: 30%">
136-
Add Scheduling Command
137-
</button>
135+
<div>
136+
<button id="btnSchedule" class="mdl-button mdl-js-button mdl-button--raised mdl-js-ripple-effect demo-btn" style="margin-bottom:10px; width: 30%">
137+
Add Scheduling Command
138+
</button>
139+
<div class="mdl-textfield" style="width: 69%">
140+
<span style="font-size: 14px;margin-left: 50px">Documentation on the scheduling language can be found here [URL to come].</span>
141+
</div>
142+
</div>
138143
<table class="mdl-data-table mdl-js-data-table" style="width: 100%;
139144
margin-bottom: 8px">
140145
<tbody id="tblSchedule">

documentation/docs/scheduling.md

Lines changed: 100 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,30 @@ Tensor<double> x("x", {64}, {Dense});
88
Tensor<double> y("y", {512}, {Dense});
99

1010
IndexVar i("i"), j("j");
11-
y(i) = A(i, j) * x(j);
11+
Access matrix = A(i, j);
12+
y(i) = matrix * x(j);
1213
IndexStmt stmt = y.getAssignment().concretize();
1314
```
1415
```c
1516
for (int32_t i = 0; i < A1_dimension; i++) {
16-
double y_val = 0.0;
17-
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
18-
int32_t j = A2_crd[jA];
19-
y_val += A_vals[jA] * x_vals[j];
20-
}
21-
y_vals[i] = y_val;
17+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
18+
int32_t j = A2_crd[jA];
19+
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
20+
}
2221
}
2322
```
2423
# Pos
2524

26-
The `pos(i, ipos, access)` transformation takes in an index variable `i` that operates over the coordinate space of `access` and replaces it with a derived index variable `ipos` that operates over the same iteration range, but with respect to the the position space.
25+
The `pos(i, ipos, access)` transformation takes in an index variable `i` that iterates over the coordinate space of `access` and replaces it with a derived index variable `ipos` that iterates over the same iteration range, but with respect to the the position space.
2726

2827
Since the `pos` transformation is not valid for dense level formats, for the SpMV example, the following would result in an error:
2928
```c++
30-
stmt = stmt.pos(i, IndexVar("ipos"), A);
29+
stmt = stmt.pos(i, IndexVar("ipos"), matrix);
3130
```
3231

3332
We could instead have:
3433
```c++
35-
stmt = stmt.pos(j, IndexVar("jpos"), A);
34+
stmt = stmt.pos(j, IndexVar("jpos"), matrix);
3635
```
3736
```c
3837
for (int32_t i = 0; i < A1_dimension; i++) {
@@ -50,9 +49,24 @@ for (int32_t i = 0; i < A1_dimension; i++) {
5049

5150
The `fuse(i, j, f)` transformation takes in two index variables `i` and `j`, where `j` is directly nested under `i`, and collapses them into a fused index variable `f` that iterates over the product of the coordinates `i` and `j`.
5251

53-
For the SpMV example, we could have:
52+
`fuse` helps facilitate other transformations, such as iterating over the position space of several index variables, as in this SpMV example:
5453
```c++
55-
stmt = stmt.fuse(i, j, IndexVar("f"));
54+
IndexVar f("f");
55+
stmt = stmt.fuse(i, j, f);
56+
stmt = stmt.pos(f, IndexVar("fpos"), matrix);
57+
```
58+
```c
59+
for (int32_t fposA = 0; fposA < A2_pos[A1_dimension]; fposA++) {
60+
if (fposA >= A2_pos[A1_dimension])
61+
continue;
62+
63+
int32_t f = A2_crd[fposA];
64+
while (fposA == A2_pos[(i_pos + 1)]) {
65+
i_pos++;
66+
i = i_pos;
67+
}
68+
y_vals[i] = y_vals[i] + A_vals[fposA] * x_vals[f];
69+
}
5670
```
5771

5872
# Split
@@ -61,7 +75,7 @@ The `split(i, i0, i1, splitFactor)` transformation splits (strip-mines) an index
6175

6276
For the SpMV example, we could have:
6377
```c++
64-
stmt = stmt.split(j, IndexVar("i0"), IndexVar("i1"), 16);
78+
stmt = stmt.split(i, IndexVar("i0"), IndexVar("i1"), 16);
6579
```
6680
```c
6781
for (int32_t i0 = 0; i0 < ((A1_dimension + 15) / 16); i0++) {
@@ -80,8 +94,39 @@ for (int32_t i0 = 0; i0 < ((A1_dimension + 15) / 16); i0++) {
8094

8195
# Divide
8296

97+
The `divide(i, i0, i1, divideFactor)` transformation divides an index variable `i` into two nested index variables `i0` and `i1`. The size of the outer index variable `i0` is then held constant at `divideFactor`, which must be a positive integer.
98+
99+
[TODO example, divide not implemented yet.]
100+
83101
# Precompute
84102

103+
The `precompute(expr, i, iw, workspace)` transformation, which is described in more detail [here](http://tensor-compiler.org/taco-workspaces.pdf), leverages scratchpad memories and reorders computations to increase locality.
104+
105+
Given a subexpression `expr` to precompute, an index variable `i` to precompute over, and an index variable `iw` (can be the same or different as `i`) to precompute with, the precomputed results are stored in the tensor variable `workspace`.
106+
107+
For the SpMV example, if `rhs` is the right hand side of the original statement, we could have:
108+
```c++
109+
TensorVar workspace("workspace", Type(Float64, {Dimension(64)}), taco::dense);
110+
stmt = stmt.precompute(rhs, j, j, workspace);
111+
```
112+
```c
113+
for (int32_t i = 0; i < A1_dimension; i++) {
114+
double* restrict workspace = 0;
115+
workspace = (double*)malloc(sizeof(double) * 64);
116+
for (int32_t pworkspace = 0; pworkspace < 64; pworkspace++) {
117+
workspace[pworkspace] = 0.0;
118+
}
119+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
120+
int32_t j = A2_crd[jA];
121+
workspace[j] = A_vals[jA] * x_vals[j];
122+
}
123+
for (int32_t j = 0; j < ; j++) {
124+
y_vals[i] = y_vals[i] + workspace[j];
125+
}
126+
free(workspace);
127+
}
128+
```
129+
85130
# Reorder
86131

87132
The `reorder(vars)` transformation takes in a new ordering for a set of index variables in the expression that are directly nested in the iteration order.
@@ -101,11 +146,52 @@ for (int32_t jA = A2_pos[iA]; jA < A2_pos[(iA + 1)]; jA++) {
101146

102147
# Bound
103148

149+
The `bound(i, ibound, bound, bound_type)` transformation replaces an index variable `i` with an index variable `ibound` that obeys a compile-time constraint on its iteration space, incorporating knowledge about the size or structured sparsity pattern of the corresponding input. The meaning of `bound` depends on the `bound_type`.
150+
151+
For the SpMV example, we could have
152+
```c++
153+
stmt = stmt.bound(i, IndexVar("ibound"), 100, BoundType::MaxExact);
154+
```
155+
```c
156+
for (int32_t ibound = 0; ibound < 100; ibound++) {
157+
for (int32_t jA = A2_pos[ibound]; jA < A2_pos[(ibound + 1)]; jA++) {
158+
int32_t j = A2_crd[jA];
159+
y_vals[ibound] = y_vals[ibound] + A_vals[jA] * x_vals[j];
160+
}
161+
}
162+
```
163+
104164
# Unroll
105165

106-
# Parallelize
166+
The `unroll(i, unrollFactor)` transformation unrolls the loop corresponding to an index variable `i` by `unrollFactor` number of iterations, where `unrollFactor` is a positive integer.
167+
168+
[TODO example, can't get unroll to work?]
107169

170+
# Parallelize
108171

172+
The `parallelize(i, parallel_unit, output_race_strategy)` transformation tags an index variable `i` for parallel execution on hardware type `parallel_unit`. Data races are handled by an `output_race_strategy`.
109173

174+
Since the other transformations expect serial code, `parallelize` must come last in a series of transformations. For the SpMV example, we could have
175+
```c++
176+
IndexVar i0("i0"), i1("i1");
177+
stmt = stmt.split(i, i0, i1, 32);
178+
stmt = stmt.reorder({i0, i1, j});
179+
stmt = stmt.parallelize(i0, ParallelUnit::CPUThread, OutputRaceStrategy::NoRaces);
180+
```
181+
```c
182+
#pragma omp parallel for schedule(runtime)
183+
for (int32_t i0 = 0; i0 < ((A1_dimension + 31) / 32); i0++) {
184+
for (int32_t i1 = 0; i1 < 32; i1++) {
185+
int32_t i = i0 * 32 + i1;
186+
if (i >= A1_dimension)
187+
continue;
188+
189+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
190+
int32_t j = A2_crd[jA];
191+
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
192+
}
193+
}
194+
}
195+
```
110196

111197

server/taco_server.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def do_POST(self):
3232
computePath = prefix + "taco_compute.c"
3333
assemblyPath = prefix + "taco_assembly.c"
3434
cmd = tacoPath + " " + cmd + " -write-source=" + writePath + " -write-compute=" + computePath + " -write-assembly=" + assemblyPath
35-
35+
3636
try:
3737
subprocess.check_output(str.split(cmd), timeout=3, stderr=subprocess.STDOUT)
3838
with open(writePath, 'r') as f:
@@ -52,10 +52,10 @@ def do_POST(self):
5252
if search is not None:
5353
response['error'] = search.group()[3:-1]
5454
else:
55-
response['error'] = 'Expression is currently not supported'
55+
response['error'] = 'Expression and/or schedule is currently not supported'
5656
logFile = "/home/ubuntu/errors.log"
57-
except Exception as e:
58-
response['error'] = 'Expression is currently not supported'
57+
except:
58+
response['error'] = 'Expression and/or schedule is currently not supported'
5959
logFile = "/home/ubuntu/errors.log"
6060

6161
ip = ".".join(self.client_address[0].split('.')[0:-2]) + ".*.*"

0 commit comments

Comments
 (0)