@@ -84,7 +84,7 @@ summation algorithm (as implemented in Base.sum) starts losing accuracy as soon
84
84
as the condition number increases, computing only noise when the condition
85
85
number exceeds 1/ϵ≃10¹⁶. The same goes for the naive summation algorithm.
86
86
In contrast, both compensated algorithms
87
- (Kahan- Babuska- Neumaier and Ogita- Rump- Oishi) still accurately compute the
87
+ (Kahan– Babuska– Neumaier and Ogita– Rump– Oishi) still accurately compute the
88
88
result at this point, and start losing accuracy there, computing meaningless
89
89
results when the condition nuber reaches 1/ϵ²≃10³². In effect these (simply)
90
90
compensated algorithms produce the same results as if a naive summation had been
@@ -151,8 +151,8 @@ thousands of elements), the implementation is memory bound (as expected of a
151
151
typical BLAS1 operation). Which is why we see significant decreases in the
152
152
performance when the vector can’t fit into the L1, L2 or L3 cache.
153
153
154
- On this AVX512-enabled system, the Kahan- Babuska- Neumaier implementation tends
155
- to be a little more efficient than the Ogita- Rump- Oishi algorithm (this would
154
+ On this AVX512-enabled system, the Kahan– Babuska– Neumaier implementation tends
155
+ to be a little more efficient than the Ogita– Rump– Oishi algorithm (this would
156
156
generally the opposite for AVX2 systems). When implemented with a suitable
157
157
unrolling level and cache prefetching, these implementations are CPU-bound when
158
158
vectors fit inside the L1 or L2 cache. However, when vectors are too large to
0 commit comments