Skip to content

Commit 735be6f

Browse files
author
Jessica Shi
committed
add some more features
1 parent dbbfdb8 commit 735be6f

File tree

5 files changed

+229
-137
lines changed

5 files changed

+229
-137
lines changed

codegen.html

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
<script src="javascripts/parser.js"></script>
2424
<script src="javascripts/parser-indices.js"></script>
2525
<script src="javascripts/parser-accesses.js"></script>
26+
<script src="javascripts/default-schedules.js"></script>
2627
<script src="javascripts/demo.js"></script>
2728
<script type="text/x-mathjax-config">
2829
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
@@ -132,12 +133,22 @@ <h6 style="margin-bottom: 0px; margin-top: 18px">Input a tensor algebra expressi
132133
<div class="mdl-grid" style="padding-top: 6px">
133134
<div class="mdl-layout-spacer"></div>
134135
<div class="mdl-cell mdl-cell--9-col">
136+
<!-- <div>
137+
<span style="font-size: 14px;margin-left: 50px">Documentation on the scheduling language can be found here [URL to come].</span>
138+
</div> -->
135139
<div>
136-
<button id="btnSchedule" class="mdl-button mdl-js-button mdl-button--raised mdl-js-ripple-effect demo-btn" style="margin-bottom:10px; width: 30%">
140+
<button id="btnSchedule" class="mdl-button mdl-js-button mdl-button--raised mdl-js-ripple-effect demo-btn" style="margin-bottom: 10px; width: 30%">
137141
Add Scheduling Command
138142
</button>
139-
<div class="mdl-textfield" style="width: 69%">
140-
<span style="font-size: 14px;margin-left: 50px">Documentation on the scheduling language can be found here [URL to come].</span>
143+
144+
<div id="btnDefaults" style="float: right">
145+
<button id="btnCPU" class="mdl-button mdl-js-button mdl-button--raised mdl-js-ripple-effect demo-btn" style="margin-right: 10px; width: inherit">
146+
SpMV CPU
147+
</button>
148+
149+
<button id="btnGPU" class="mdl-button mdl-js-button mdl-button--raised mdl-js-ripple-effect demo-btn" style="width: inherit">
150+
SpMV GPU
151+
</button>
141152
</div>
142153
</div>
143154
<table class="mdl-data-table mdl-js-data-table" style="width: 100%;

documentation/docs/scheduling.md

Lines changed: 44 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -92,17 +92,16 @@ for (int32_t i0 = 0; i0 < ((A1_dimension + 15) / 16); i0++) {
9292
}
9393
```
9494

95-
# Divide
95+
<!-- (not yet implemented) -->
96+
<!-- # Divide
9697
97-
The `divide(i, i0, i1, divideFactor)` transformation divides an index variable `i` into two nested index variables `i0` and `i1`. The size of the outer index variable `i0` is then held constant at `divideFactor`, which must be a positive integer.
98-
99-
[TODO example, divide not implemented yet.]
98+
The `divide(i, i0, i1, divideFactor)` transformation divides an index variable `i` into two nested index variables `i0` and `i1`. The size of the outer index variable `i0` is then held constant at `divideFactor`, which must be a positive integer. -->
10099

101100
# Precompute
102101

103102
The `precompute(expr, i, iw, workspace)` transformation, which is described in more detail [here](http://tensor-compiler.org/taco-workspaces.pdf), leverages scratchpad memories and reorders computations to increase locality.
104103

105-
Given a subexpression `expr` to precompute, an index variable `i` to precompute over, and an index variable `iw` (can be the same or different as `i`) to precompute with, the precomputed results are stored in the tensor variable `workspace`.
104+
Given a subexpression `expr` to precompute, an index variable `i` to precompute over, and an index variable `iw` (which can be the same or different as `i`) to precompute with, the precomputed results are stored in the tensor variable `workspace`.
106105

107106
For the SpMV example, if `rhs` is the right hand side of the original statement, we could have:
108107
```c++
@@ -165,31 +164,54 @@ for (int32_t ibound = 0; ibound < 100; ibound++) {
165164

166165
The `unroll(i, unrollFactor)` transformation unrolls the loop corresponding to an index variable `i` by `unrollFactor` number of iterations, where `unrollFactor` is a positive integer.
167166

168-
[TODO example, can't get unroll to work?]
167+
For the SpMV example, we could have
168+
```c++
169+
stmt = stmt.split(i, i0, i1, 32);
170+
stmt = stmt.unroll(i0, 4);
171+
```
172+
```c
173+
if ((((A1_dimension + 31) / 32) * 32 + 32) + (((A1_dimension + 31) / 32) * 32 + 32) >= A1_dimension) {
174+
for (int32_t i0 = 0; i0 < ((A1_dimension + 31) / 32); i0++) {
175+
for (int32_t i1 = 0; i1 < 32; i1++) {
176+
int32_t i = i0 * 32 + i1;
177+
if (i >= A1_dimension)
178+
continue;
179+
180+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
181+
int32_t j = A2_crd[jA];
182+
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
183+
}
184+
}
185+
}
186+
}
187+
else {
188+
#pragma unroll 4
189+
for (int32_t i0 = 0; i0 < ((A1_dimension + 31) / 32); i0++) {
190+
for (int32_t i1 = 0; i1 < 32; i1++) {
191+
int32_t i = i0 * 32 + i1;
192+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
193+
int32_t j = A2_crd[jA];
194+
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
195+
}
196+
}
197+
}
198+
}
199+
```
169200

170201
# Parallelize
171202

172-
The `parallelize(i, parallel_unit, output_race_strategy)` transformation tags an index variable `i` for parallel execution on hardware type `parallel_unit`. Data races are handled by an `output_race_strategy`.
203+
The `parallelize(i, parallel_unit, output_race_strategy)` transformation tags an index variable `i` for parallel execution on hardware type `parallel_unit`. Data races are handled by an `output_race_strategy`. Since the other transformations expect serial code, `parallelize` must come last in a series of transformations.
173204

174-
Since the other transformations expect serial code, `parallelize` must come last in a series of transformations. For the SpMV example, we could have
205+
For the SpMV example, we could have
175206
```c++
176-
IndexVar i0("i0"), i1("i1");
177-
stmt = stmt.split(i, i0, i1, 32);
178-
stmt = stmt.reorder({i0, i1, j});
179-
stmt = stmt.parallelize(i0, ParallelUnit::CPUThread, OutputRaceStrategy::NoRaces);
207+
stmt = stmt.parallelize(i, ParallelUnit::CPUThread, OutputRaceStrategy::NoRaces);
180208
```
181209
```c
182210
#pragma omp parallel for schedule(runtime)
183-
for (int32_t i0 = 0; i0 < ((A1_dimension + 31) / 32); i0++) {
184-
for (int32_t i1 = 0; i1 < 32; i1++) {
185-
int32_t i = i0 * 32 + i1;
186-
if (i >= A1_dimension)
187-
continue;
188-
189-
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
190-
int32_t j = A2_crd[jA];
191-
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
192-
}
211+
for (int32_t i = 0; i < A1_dimension; i++) {
212+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
213+
int32_t j = A2_crd[jA];
214+
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
193215
}
194216
}
195217
```

javascripts/default-schedules.js

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
2+
var NNZ_PER_THREAD = 8;
3+
var WARP_SIZE = 32;
4+
var BLOCK_SIZE = 256;
5+
6+
var default_CPU_schedules = {
7+
spmv: [
8+
{
9+
command: "split",
10+
parameters: ["i", "i0", "i1", 32]
11+
},
12+
{
13+
command: "reorder",
14+
numReordered: 3,
15+
parameters: ["i0", "i1", "j"]
16+
},
17+
{
18+
command: "parallelize",
19+
parameters: ["i0", "CPU Thread", "No Races"]
20+
}
21+
],
22+
add: [],
23+
ttv: [
24+
{
25+
command: "fuse",
26+
parameters: ["i", "j", "f"]
27+
},
28+
{
29+
command: "pos",
30+
parameters: ["f", "fpos", "B"]
31+
},
32+
{
33+
command: "split",
34+
parameters: ["fpos", "chunk", "fpos2", 8]
35+
},
36+
{
37+
command: "reorder",
38+
numReordered: 3,
39+
parameters: ["chunk", "fpos2", "k"]
40+
},
41+
{
42+
command: "parallelize",
43+
parameters: ["chunk", "CPU Thread", "No Races"]
44+
}
45+
],
46+
mttkrp: [
47+
{
48+
command: "reorder",
49+
numReordered: 4,
50+
parameters: ["i", "k", "l", "j"]
51+
},
52+
{
53+
command: "precompute",
54+
parameters: ["j", "j", "B(i,k,l) * D(l,j)"]
55+
},
56+
{
57+
command: "split",
58+
parameters: ["i", "i0", "i1", 32]
59+
},
60+
{
61+
command: "parallelize",
62+
parameters: ["i0", "CPU Thread", "No Races"]
63+
}
64+
]
65+
}
66+
67+
var default_GPU_schedules = {
68+
spmv: [
69+
{
70+
command: "fuse",
71+
parameters: ["i", "j", "f"]
72+
},
73+
{
74+
command: "pos",
75+
parameters: ["f", "fpos", "A"]
76+
},
77+
{
78+
command: "split",
79+
parameters: ["fpos", "block", "fpos1", NNZ_PER_THREAD * BLOCK_SIZE]
80+
},
81+
{
82+
command: "split",
83+
parameters: ["fpos1", "warp", "fpos2", NNZ_PER_THREAD * WARP_SIZE]
84+
},
85+
{
86+
command: "split",
87+
parameters: ["fpos2", "thread", "thr_nz", NNZ_PER_THREAD]
88+
},
89+
{
90+
command: "reorder",
91+
numReordered: 4,
92+
parameters: ["block", "warp", "thread", "thr_nz"]
93+
},
94+
{
95+
command: "precompute",
96+
parameters: ["thr_nz", "thr_nz_pre", "A(i, j) * x(j)"]
97+
},
98+
{
99+
command: "unroll",
100+
parameters: ["thr_nz_pre", NNZ_PER_THREAD]
101+
},
102+
{
103+
command: "parallelize",
104+
parameters: ["block", "GPU Block", "Ignore Races"]
105+
},
106+
{
107+
command: "parallelize",
108+
parameters: ["warp", "GPU Warp", "Ignore Races"]
109+
},
110+
{
111+
command: "parallelize",
112+
parameters: ["thread", "GPU Thread", "Atomics"]
113+
}
114+
]
115+
}

0 commit comments

Comments
 (0)