You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
18
-
int32_t j = A2_crd[jA];
19
-
y_val += A_vals[jA] * x_vals[j];
20
-
}
21
-
y_vals[i] = y_val;
17
+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
18
+
int32_t j = A2_crd[jA];
19
+
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
20
+
}
22
21
}
23
22
```
24
23
# Pos
25
24
26
-
The `pos(i, ipos, access)` transformation takes in an index variable `i` that operates over the coordinate space of `access` and replaces it with a derived index variable `ipos` that operates over the same iteration range, but with respect to the the position space.
25
+
The `pos(i, ipos, access)` transformation takes in an index variable `i` that iterates over the coordinate space of `access` and replaces it with a derived index variable `ipos` that iterates over the same iteration range, but with respect to the the position space.
27
26
28
27
Since the `pos` transformation is not valid for dense level formats, for the SpMV example, the following would result in an error:
29
28
```c++
30
-
stmt = stmt.pos(i, IndexVar("ipos"), A);
29
+
stmt = stmt.pos(i, IndexVar("ipos"), matrix);
31
30
```
32
31
33
32
We could instead have:
34
33
```c++
35
-
stmt = stmt.pos(j, IndexVar("jpos"), A);
34
+
stmt = stmt.pos(j, IndexVar("jpos"), matrix);
36
35
```
37
36
```c
38
37
for (int32_t i = 0; i < A1_dimension; i++) {
@@ -50,9 +49,24 @@ for (int32_t i = 0; i < A1_dimension; i++) {
50
49
51
50
The `fuse(i, j, f)` transformation takes in two index variables `i` and `j`, where `j` is directly nested under `i`, and collapses them into a fused index variable `f` that iterates over the product of the coordinates `i` and `j`.
52
51
53
-
For the SpMV example, we could have:
52
+
`fuse` helps facilitate other transformations, such as iterating over the position space of several index variables, as in this SpMV example:
54
53
```c++
55
-
stmt = stmt.fuse(i, j, IndexVar("f"));
54
+
IndexVar f("f");
55
+
stmt = stmt.fuse(i, j, f);
56
+
stmt = stmt.pos(f, IndexVar("fpos"), matrix);
57
+
```
58
+
```c
59
+
for (int32_t fposA = 0; fposA < A2_pos[A1_dimension]; fposA++) {
The `divide(i, i0, i1, divideFactor)` transformation divides an index variable `i` into two nested index variables `i0` and `i1`. The size of the outer index variable `i0` is then held constant at `divideFactor`, which must be a positive integer.
98
+
99
+
[TODO example, divide not implemented yet.]
100
+
83
101
# Precompute
84
102
103
+
The `precompute(expr, i, iw, workspace)` transformation, which is described in more detail [here](http://tensor-compiler.org/taco-workspaces.pdf), leverages scratchpad memories and reorders computations to increase locality.
104
+
105
+
Given a subexpression `expr` to precompute, an index variable `i` to precompute over, and an index variable `iw` (can be the same or different as `i`) to precompute with, the precomputed results are stored in the tensor variable `workspace`.
106
+
107
+
For the SpMV example, if `rhs` is the right hand side of the original statement, we could have:
for (int32_t pworkspace = 0; pworkspace < 64; pworkspace++) {
117
+
workspace[pworkspace] = 0.0;
118
+
}
119
+
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
120
+
int32_t j = A2_crd[jA];
121
+
workspace[j] = A_vals[jA] * x_vals[j];
122
+
}
123
+
for (int32_t j = 0; j < ; j++) {
124
+
y_vals[i] = y_vals[i] + workspace[j];
125
+
}
126
+
free(workspace);
127
+
}
128
+
```
129
+
85
130
# Reorder
86
131
87
132
The `reorder(vars)` transformation takes in a new ordering for a set of index variables in the expression that are directly nested in the iteration order.
@@ -101,11 +146,52 @@ for (int32_t jA = A2_pos[iA]; jA < A2_pos[(iA + 1)]; jA++) {
101
146
102
147
# Bound
103
148
149
+
The `bound(i, ibound, bound, bound_type)` transformation replaces an index variable `i` with an index variable `ibound` that obeys a compile-time constraint on its iteration space, incorporating knowledge about the size or structured sparsity pattern of the corresponding input. The meaning of `bound` depends on the `bound_type`.
The `unroll(i, unrollFactor)` transformation unrolls the loop corresponding to an index variable `i` by `unrollFactor` number of iterations, where `unrollFactor` is a positive integer.
167
+
168
+
[TODO example, can't get unroll to work?]
107
169
170
+
# Parallelize
108
171
172
+
The `parallelize(i, parallel_unit, output_race_strategy)` transformation tags an index variable `i` for parallel execution on hardware type `parallel_unit`. Data races are handled by an `output_race_strategy`.
109
173
174
+
Since the other transformations expect serial code, `parallelize` must come last in a series of transformations. For the SpMV example, we could have
0 commit comments