You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `divide(i, i0, i1, divideFactor)` transformation divides an index variable `i` into two nested index variables `i0` and `i1`. The size of the outer index variable `i0` is then held constant at `divideFactor`, which must be a positive integer.
98
-
99
-
[TODO example, divide not implemented yet.]
98
+
The `divide(i, i0, i1, divideFactor)` transformation divides an index variable `i` into two nested index variables `i0` and `i1`. The size of the outer index variable `i0` is then held constant at `divideFactor`, which must be a positive integer. -->
100
99
101
100
# Precompute
102
101
103
102
The `precompute(expr, i, iw, workspace)` transformation, which is described in more detail [here](http://tensor-compiler.org/taco-workspaces.pdf), leverages scratchpad memories and reorders computations to increase locality.
104
103
105
-
Given a subexpression `expr` to precompute, an index variable `i` to precompute over, and an index variable `iw` (can be the same or different as `i`) to precompute with, the precomputed results are stored in the tensor variable `workspace`.
104
+
Given a subexpression `expr` to precompute, an index variable `i` to precompute over, and an index variable `iw` (which can be the same or different as `i`) to precompute with, the precomputed results are stored in the tensor variable `workspace`.
106
105
107
106
For the SpMV example, if `rhs` is the right hand side of the original statement, we could have:
The `unroll(i, unrollFactor)` transformation unrolls the loop corresponding to an index variable `i` by `unrollFactor` number of iterations, where `unrollFactor` is a positive integer.
for (int32_t jA = A2_pos[i]; jA < A2_pos[(i + 1)]; jA++) {
193
+
int32_t j = A2_crd[jA];
194
+
y_vals[i] = y_vals[i] + A_vals[jA] * x_vals[j];
195
+
}
196
+
}
197
+
}
198
+
}
199
+
```
169
200
170
201
# Parallelize
171
202
172
-
The `parallelize(i, parallel_unit, output_race_strategy)` transformation tags an index variable `i` for parallel execution on hardware type `parallel_unit`. Data races are handled by an `output_race_strategy`.
203
+
The `parallelize(i, parallel_unit, output_race_strategy)` transformation tags an index variable `i` for parallel execution on hardware type `parallel_unit`. Data races are handled by an `output_race_strategy`. Since the other transformations expect serial code, `parallelize` must come last in a series of transformations.
173
204
174
-
Since the other transformations expect serial code, `parallelize` must come last in a series of transformations. For the SpMV example, we could have
0 commit comments