Commit 181528e
Fixing the performance issue/bug in copying large vectors when using --at-most-two-errors-per-detector flag (#45)
### Fixing the performance issue (and also implicitly a bug that
existed)
This PR fixes the performance issue of costly `std::vector` copy
operations performed when the `--at-most-two-errors-per-detector` flag
is enabled. As discussed in #27, the initial `next_next_blocked_errs =
next_blocked_errs` line that existed used to consume a significant
decoding time, as each time a new search state was being
explored/constructed, this line would make a local copy of blocked
errors for the current state being processed, so that changes made on
that current state do not affect following states being explored. In
#27, I realized these operations of creating local copies of
`std::vector` data structures only had to be performed when the
`--at-most-two-errors-per-detector` flag was enabled, as in that case,
_Tesseract_ was performing additional changes on the vector of blocked
errors that had to be reverted in the following iteration of exploring a
search state. In this PR, I address these issues also when the
`--at-most-two-errors-per-detector` flag is enabled, as it is highly
inefficient to copy entire large vectors each time a new search state is
explored, only to revert a few changes. I achieved this by storing a
special value `2` (instead of `true/1`) for the errors that are blocked
due to the `--at-most-two-errors-per-detector` flag and that should be
unblocked/reverted in the next search state. Note that this is possible
to implement now, as we are now storing boolean elements as integers,
rather than simple bits/values that are always either `1` or `0`.
The evaluation of this change/optimization could be explored in
different dimensions:
1. Before I started working on _Tesseract_,
`--at-most-two-errors-per-detector` flag would frequently perform copy
operations of `std::vector<bool>`. As discussed in #25, these data
structures improve the memory efficiency by packing boolean elements
into individual bits, but I replaced them with `std::vector<char>` as
this drastically improved the performance/speed, since _Tesseract_ was
frequently accessing elements in `std::vector<bool>`, which induced
significant overhead, due to the bit-wise operations being performed.
2. After I applied the optimization in #25, we were performing copies of
`std::vector<char>`, which required more time, as they stored larger
elements.
3. Finally, after I implemented optimization in #34, we would perform
copies of `std::vector<DetectorCostTuple>`, where each element is 8
bytes, requiring even more time.
Since we were not using the `--at-most-two-errors-per-detector` flag, as
it affects the accuracy of the decoder, I focused my optimizations when
not enabling this flag. However, the _Tesseract_ algorithm was still
left with this code that was frequently performing copies of large
vectors when the `--at-most-two-errors-per-detector` flag was enabled,
only to revert a few changes. In this PR, I fix this this performance
issue when the `--at-most-two-errors-per-detector` flag is enabled, by
imposing a smarter strategy: storing a special `2` value for the errors
that need to be unblocked/reverted in the following iteration of
exploring a search state.
Below are graphs where I evaluate the performance improvement I achieved
by removing these unnecessary copy operations when the
`--at-most-two-errors-per-detector` flag is enabled. Note that I am
evaluating this when removing copy operations of
`std::vector<DetectorCostTuple>`, as this was the last data
representation I implemented before this PR. I also noticed that this PR
affects the accuracy of the decoder. The reason why this PR
affects/improves the accuracy of the decoder when using the
`--at-most-two-errors-per-detector` flag is because the code had a bug.
The code below:
```
for (int d : edets[ei]) {
next_detectors[d] = !next_detectors[d];
int fired = next_detectors[d] ? 1 : -1;
next_num_detectors += fired;
for (int oei : d2e[d]) {
next_detector_cost_tuples[oei].detectors_count += fired;
}
if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
for (size_t oei : d2e[d]) {
next_next_detector_cost_tuples[oei].error_blocked = true;
}
}
}
```
contains the critical loop with the bug. This loop updates the number of
fired detectors for each error in the `next_detector_cost_tuples` and
blocks errors in the `next_next_detector_cost_tuples`. However, when
calling the `get_detcost` function, the `next_next_detector_cost_tuples`
is provided as the argument. This inconsistency occurred only when the
flag `--at-most-two-errors-per-detector` is enabled, as in that case,
`next_next_detector_cost_tuples` was being constructed and passed to
`get_detcost`, but the modifications on the fired detectors were
performed on the `next_detector_cost_tuples`. This explains having more
low confidence results and also 3 errors in a surface code benchmark
(r=11, p=0.002, 500 shots).
In this PR, this loop with the bug is replaced with:
```
for (int d : edets[ei]) {
next_detectors[d] = !next_detectors[d];
int fired = next_detectors[d] ? 1 : -1;
next_num_detectors += fired;
for (int oei : d2e[d]) {
next_detector_cost_tuples[oei].detectors_count += fired;
}
if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
for (size_t oei : d2e[d]) {
next_detector_cost_tuples[oei].error_blocked = next_detector_cost_tuples[oei].error_blocked == 1 ? 1 : 2;
}
}
}
```
As explained earlier, this PR entirely removes the
`next_next_detector_cost_tuples` and replaces it with the smarter
strategy of reverting changes made on blocked errors, explained earlier.
Since I completely removed this array, I also fixed the bug, as now
changes on the number of fired detectors and blocked errors are always
performed on the same `next_detector_cost_tuples` array.
**Note: I evaluated the change/improvement in accuracy only by comparing
the number of low confidence results. I also measured the number of
errors when executing shots/simulations, but for the benchmarks below I
tested with, there were no errors before and after I implemented this
PR. The only exception was the surface code benchmark (r=11, p=0.002,
500 shots). For this benchmark, I encountered 3 errors (from all 500
shots) before I implemented this PR and 0 after I implemented this PR.**
<img width="1778" height="870" alt="Screenshot 2025-07-18 7 40 19 PM"
src="https://github.com/user-attachments/assets/a1f54d20-ee00-43bf-8c28-f29aaf487d80"
/>
<img width="1778" height="876" alt="Screenshot 2025-07-18 7 40 39 PM"
src="https://github.com/user-attachments/assets/87279c31-abae-4860-9bab-749eb02631ee"
/>
<img width="1778" height="877" alt="Screenshot 2025-07-18 7 41 01 PM"
src="https://github.com/user-attachments/assets/07f820fa-c2de-4029-b5c4-52c0e035dc71"
/>
<img width="1778" height="875" alt="Screenshot 2025-07-18 7 41 21 PM"
src="https://github.com/user-attachments/assets/8a4275ed-a787-4733-9e2c-8c609406edbc"
/>
### Analyzing the impact of this flag on the performance and accuracy
**Now that we have this flag fixed and optimized (as the version without
using the flag), we can analyze its impact on the performance and
accuracy of the decoder.**
I first analyzed the performance and accuracy impact of this flag using
the same benchmarks I used to test the performance/bug fix I implemented
in this PR. I noticed that for these benchmarks, the flag provides
somewhat better accuracy, but lower performance. Below are graphs that
compare the accuracy and performance with and without using the
`--at-most-two-errors-per-detector` flag.
<img width="1778" height="985" alt="Screenshot 2025-07-18 7 41 52 PM"
src="https://github.com/user-attachments/assets/d4e4f5f3-75e8-46e1-9fdf-51537316d652"
/>
<img width="1778" height="874" alt="Screenshot 2025-07-18 7 42 10 PM"
src="https://github.com/user-attachments/assets/b31b28cd-d9a9-4a2a-bae2-88f78a16a5e0"
/>
<img width="1778" height="986" alt="Screenshot 2025-07-18 7 42 27 PM"
src="https://github.com/user-attachments/assets/c6ea54d6-0738-44b3-924f-4632291e41e1"
/>
<img width="1778" height="883" alt="Screenshot 2025-07-18 7 42 44 PM"
src="https://github.com/user-attachments/assets/e5f64154-da58-406d-8e47-2be2a0004a37"
/>
### More data on the performance and accuracy impact of the flag
I performed additional experiments/benchmarks to collect more
comprehensive data on the impact of this flag on the performance and
accuracy of the _Tesseract_ decoder. Below are plots for various groups
of codes. It confirms that for (most of) benchmarks it provides somewhat
better accuracy, but lower performance.
<img width="1763" height="980" alt="Screenshot 2025-07-18 1 50 52 PM"
src="https://github.com/user-attachments/assets/3ac277dd-4e9d-418c-956d-dc331ef12019"
/>
<img width="1763" height="981" alt="Screenshot 2025-07-18 1 52 49 PM"
src="https://github.com/user-attachments/assets/9c7e50ef-7bb2-4805-8e8c-d1df4152cc10"
/>
<img width="1762" height="981" alt="Screenshot 2025-07-18 1 55 53 PM"
src="https://github.com/user-attachments/assets/1803cccf-4f25-4b9a-bb2a-3818412f60de"
/>
<img width="1762" height="980" alt="Screenshot 2025-07-18 1 57 34 PM"
src="https://github.com/user-attachments/assets/b5645353-8168-4b39-9473-4c3ed425083c"
/>
<img width="1748" height="981" alt="Screenshot 2025-07-18 2 02 48 PM"
src="https://github.com/user-attachments/assets/1084b196-365a-4e3b-a65d-bacd19929760"
/>
<img width="1748" height="981" alt="Screenshot 2025-07-18 2 04 45 PM"
src="https://github.com/user-attachments/assets/c6cbcf78-26ab-48e2-a9a3-2ff1faf3c5dc"
/>
<img width="1756" height="989" alt="Screenshot 2025-07-18 2 08 19 PM"
src="https://github.com/user-attachments/assets/e5f98227-b5e0-4eba-885f-571908d183a0"
/>
<img width="1746" height="988" alt="Screenshot 2025-07-18 3 15 07 PM"
src="https://github.com/user-attachments/assets/81c6133e-e34d-403d-85fc-320042311120"
/>
**The results show that the increase in decoding speed can range from
around 0% to over 40%. In very rare cases, this flag provides (very low)
performance improvement. The accuracy improvement ranges from 0% to over
30%, indicating that this flag can have a significant impact on the
higher accuracy.**
### Major contributions of the PR:
- Removes the performance degradation caused by optimizations I
implemented when targeting configurations that do not use this flag, but
significantly improved the decoding time without using the flag
- Completely removed inefficient/redundant `std::vector` copy operations
that were propagated due to the `next_next_blocked_errs =
next_blocked_errs` line that existed before (mentioned in PR #27)
- Fixed the performance issue/bug that existed when using the
`--at-most-two-errors-per-detector` flag, where large vectors were
frequently copied in each decoding iteration only to revert a few
changes (it is important to note that this performance issue escalated
because of the changes made in the data representation, which were
necessary to implement previous optimization strategies)
- Extensive experiments/benchmarks performed to evaluate the impact of
the performance issue/bug fix
- Extensive experiments/benchmarks performed to evaluate the impact of
the flag itself on the performance and accuracy of the decoder
### Does it provide better performance on any benchmark now?
I also tested running a benchmark our team looked at the last meeting
where we saw that using the `--at-most-two-errors-per-detector` flag did
provide better performance. I specifically tested running this
benchmark:
`bazel build src:all && time ./bazel-bin/src/tesseract --pqlimit 200000
--beam 5 --num-det-orders 20 --sample-num-shots 20 --det-order-seed
13267562 --circuit
testdata/colorcodes/r\=9\,d\=9\,p\=0.002\,noise\=si1000\,c\=superdense_color_code_X\,q\=121\,gates\=cz.stim
--sample-seed 717347 --threads 1 --print-stats`
with and without the `--at-most-two-errors-per-detector` flag. However,
the execution time I had without using the flag was 69.01 seconds, and
with using the flag 74.23 seconds. There were no errors or low
confidence results in each run. I think the benchmark we looked at
during our last meeting used the installation of _Tesseract_ before my
optimization from #34. If so, this shows that my optimizations had
higher impact when not using this flag, and also shows that the
performance improvement I achieved outweighs this flag's initial
speedup.
**Conclusion: I am very confident that the current version of the
_Tesseract_ algorithm is faster without using this flag due to the
optimizations I implemented in the `get_detcost` function. When
`--at-most-two-errors-per-detector` flag is enabled, more errors are
blocked, preventing them to influcence detectors' costs, and therefore
the `get_detcost` function itself. I invested a lot of time accelerating
the `get_detcost` function, so other speedups this flag initially
achieved did not outweigh the impact I achieved in #34.**
PR #47 contains the code/scripts I used to benchmark and compare color,
surface, and bicycle codes with and without using the
`--at-most-two-errors-per-detector` flag.
---------
Signed-off-by: Dragana Grbic <[email protected]>
Co-authored-by: noajshu <[email protected]>
Co-authored-by: LaLeh <[email protected]>1 parent f67c920 commit 181528e
2 files changed
+32
-32
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
72 | 73 | | |
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
77 | | - | |
78 | 78 | | |
| 79 | + | |
| 80 | + | |
79 | 81 | | |
80 | | - | |
81 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
82 | 86 | | |
83 | 87 | | |
84 | 88 | | |
| |||
214 | 218 | | |
215 | 219 | | |
216 | 220 | | |
217 | | - | |
| 221 | + | |
218 | 222 | | |
219 | 223 | | |
220 | 224 | | |
221 | 225 | | |
222 | 226 | | |
223 | 227 | | |
224 | 228 | | |
225 | | - | |
| 229 | + | |
226 | 230 | | |
227 | 231 | | |
228 | 232 | | |
| |||
236 | 240 | | |
237 | 241 | | |
238 | 242 | | |
239 | | - | |
| 243 | + | |
240 | 244 | | |
241 | 245 | | |
242 | 246 | | |
| |||
264 | 268 | | |
265 | 269 | | |
266 | 270 | | |
267 | | - | |
268 | 271 | | |
269 | 272 | | |
270 | 273 | | |
| |||
301 | 304 | | |
302 | 305 | | |
303 | 306 | | |
304 | | - | |
305 | | - | |
| 307 | + | |
306 | 308 | | |
307 | | - | |
308 | 309 | | |
309 | 310 | | |
310 | 311 | | |
| |||
330 | 331 | | |
331 | 332 | | |
332 | 333 | | |
333 | | - | |
| 334 | + | |
334 | 335 | | |
335 | 336 | | |
336 | 337 | | |
| |||
355 | 356 | | |
356 | 357 | | |
357 | 358 | | |
358 | | - | |
| 359 | + | |
359 | 360 | | |
360 | 361 | | |
361 | 362 | | |
362 | 363 | | |
363 | 364 | | |
364 | 365 | | |
365 | 366 | | |
366 | | - | |
367 | | - | |
368 | | - | |
| 367 | + | |
369 | 368 | | |
370 | 369 | | |
371 | 370 | | |
372 | 371 | | |
373 | 372 | | |
374 | 373 | | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
375 | 379 | | |
376 | 380 | | |
377 | 381 | | |
| |||
380 | 384 | | |
381 | 385 | | |
382 | 386 | | |
383 | | - | |
| 387 | + | |
384 | 388 | | |
385 | 389 | | |
386 | 390 | | |
387 | 391 | | |
388 | | - | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | 392 | | |
393 | 393 | | |
394 | 394 | | |
| |||
399 | 399 | | |
400 | 400 | | |
401 | 401 | | |
402 | | - | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
403 | 408 | | |
404 | 409 | | |
405 | 410 | | |
406 | 411 | | |
407 | 412 | | |
408 | 413 | | |
409 | | - | |
410 | | - | |
| 414 | + | |
| 415 | + | |
411 | 416 | | |
412 | | - | |
413 | 417 | | |
414 | 418 | | |
415 | 419 | | |
| |||
418 | 422 | | |
419 | 423 | | |
420 | 424 | | |
421 | | - | |
422 | | - | |
423 | | - | |
| 425 | + | |
424 | 426 | | |
425 | 427 | | |
426 | 428 | | |
| |||
430 | 432 | | |
431 | 433 | | |
432 | 434 | | |
433 | | - | |
434 | | - | |
435 | | - | |
| 435 | + | |
436 | 436 | | |
437 | 437 | | |
438 | 438 | | |
| |||
484 | 484 | | |
485 | 485 | | |
486 | 486 | | |
487 | | - | |
| 487 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
0 commit comments