Skip to content

Commit a6dc76c

Browse files
committed
Add optional native iterator accelerator: ~5 ns/iter, ~4x faster than rich
`ProgressBar.__iter__` now dispatches to `speedups.progressbar.FastBarIterator` (the `progressbar2[fast]` extra) when it is importable, falling back to the pure-Python generator otherwise. The native iterator counts items in a C field and only calls back into Python at redraw crossings via a small protocol (`_fast_begin`/`_fast_tick`/`_fast_end`/`_fast_end_dirty`), reusing the existing gate/redraw/calibration machinery so the redraw cadence is identical. The only behavioural difference is that `value`/`previous_value` are synced at crossings rather than every iteration, so reads between redraws lag slightly (like tqdm.n); `PROGRESSBAR_DISABLE_FASTPATH=1` forces the pure-Python path. This makes progressbar2 the fastest progress bar measured: ~5 ns/iter vs rich 19, tqdm 55. Pure Python stays ~30 ns (no native build), still ~1.8x faster than tqdm and 2nd to rich. Also: - hoist `_gate_enabled` to a local in the pure-Python iterator (free, no behaviour change), trimming the fallback hot path a few ns. - conftest `disable_native_accelerator` autouse fixture forces the pure-Python path for the rest of the suite; native behaviour is covered explicitly in tests/test_native_accelerator.py (dispatch + hooks covered without the compiled package via a fake/direct calls, so CI stays at 100% coverage; real end-to-end equivalence + issue #212 break/exception cleanup tests run where speedups is installed). - refresh benchmark artifacts + README performance section.
1 parent c89cb10 commit a6dc76c

9 files changed

Lines changed: 448 additions & 82 deletions

File tree

README.rst

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -76,27 +76,37 @@ automatically enable features like auto-resizing when the system supports it.
7676
Performance
7777
******************************************************************************
7878

79-
Wrapping a loop with ``progressbar2`` is cheap. On the benchmark machine
80-
(CPython 3.13, macOS arm64) it adds only about **31 nanoseconds per
81-
iteration** over a bare loop -- roughly **1.8x faster than tqdm** and second
82-
only to ``rich``, while being far ahead of the rest:
83-
84-
================ ==================
85-
Library Overhead per iter
86-
================ ==================
87-
rich 19 ns
88-
progressbar2 31 ns
89-
tqdm 56 ns
90-
alive-progress 246 ns
91-
click 1919 ns
92-
================ ==================
93-
94-
The per-iteration cost is dominated by deciding *whether* to redraw, not by
95-
drawing: ``progressbar2`` keeps an integer "next update" gate so the common
96-
iteration is just an increment and a couple of cheap stores, only entering the
97-
(rate-limited) redraw machinery a few times per second. Behaviour is unchanged
98-
-- the same widgets, the same redraw cadence, and ``value``/``previous_value``
99-
stay byte-identical to the pre-gate implementation on every iteration.
79+
Wrapping a loop with ``progressbar2`` is cheap -- and with the optional native
80+
accelerator it is the **fastest** progress bar available. On the benchmark
81+
machine (CPython 3.13, macOS arm64) the per-iteration overhead over a bare loop
82+
is:
83+
84+
================ ================== ==================
85+
Library Overhead per iter vs progressbar2
86+
================ ================== ==================
87+
**progressbar2** **5 ns** *(fast)* baseline
88+
rich 19 ns ~4x slower
89+
tqdm 55 ns ~11x slower
90+
alive-progress 249 ns ~52x slower
91+
click 1885 ns ~390x slower
92+
================ ================== ==================
93+
94+
Two tiers, same API:
95+
96+
- **Pure Python (default):** ~30 ns/iter -- roughly **1.8x faster than tqdm**,
97+
second only to ``rich``, with **no native build required**. An integer "next
98+
update" gate keeps the common iteration to an increment, a compare and a
99+
couple of cheap stores, only entering the (rate-limited) redraw machinery a
100+
few times per second. Behaviour is unchanged: same widgets, same redraw
101+
cadence, and ``value``/``previous_value`` stay byte-identical to the
102+
pre-gate implementation on every iteration.
103+
- **Native accelerator (**\ ``pip install progressbar2[fast]``\ **):** ~5 ns/iter,
104+
**~4x faster than rich**. A small compiled (Cython) iterator counts in a C
105+
field and only calls back into Python at redraw crossings. It engages
106+
automatically when installed; the only behavioural difference is that
107+
``bar.value`` is synced at redraw crossings rather than every iteration, so
108+
reads between redraws lag slightly (like ``tqdm.n``). Set
109+
``PROGRESSBAR_DISABLE_FASTPATH=1`` to force the pure-Python path.
100110

101111
The benchmark is fully reproducible and pits ``progressbar2`` against ``tqdm``,
102112
``rich``, ``alive-progress`` and ``click`` across iteration overhead, forced

benchmarks/chart.png

-735 Bytes
Loading

benchmarks/report.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Python progress-bar library benchmark
22

3-
_Generated 2026-06-23 17:30. Subject: **progressbar2** (version 4.5.0)._
3+
_Generated 2026-06-24 00:53. Subject: **progressbar2** (version 4.5.0)._
44

55
Compares `progressbar2` against the most common alternatives across three independent dimensions. All rendered output is written to a real pseudo-terminal (pty) that is continuously drained, so every library believes it is attached to a TTY and actually draws — the comparison is apples-to-apples, not "is output suppressed when piped".
66

@@ -27,47 +27,47 @@ Compares `progressbar2` against the most common alternatives across three indepe
2727

2828
Idiomatic "wrap my loop" call with each library's **default** settings, over **1,000,000** iterations with a trivial body. This is the real-world cost of dropping a progress bar around a fast loop. Overhead = (wrapped time − bare-loop time) / iterations. Lower is faster.
2929

30-
Bare loop baseline: **5.59 ms** for 1,000,000 iterations.
30+
Bare loop baseline: **5.53 ms** for 1,000,000 iterations.
3131

3232
| Library | Total time | Overhead/iter | vs progressbar2 |
3333
|---|--:|--:|--:|
34-
| rich | 24.7 ms | 19.1 ns | 0.63x |
35-
| **progressbar2** | 36.1 ms | 30.5 ns | baseline |
36-
| tqdm | 61.4 ms | 55.8 ns | 1.83x |
37-
| alive-progress | 251.2 ms | 245.6 ns | 8.05x |
38-
| click | 1924.1 ms | 1918.5 ns | 62.84x |
34+
| **progressbar2** | 10.4 ms | 4.8 ns | baseline |
35+
| rich | 24.7 ms | 19.2 ns | 3.97x |
36+
| tqdm | 60.2 ms | 54.6 ns | 11.30x |
37+
| alive-progress | 254.9 ms | 249.4 ns | 51.58x |
38+
| click | 1890.7 ms | 1885.2 ns | 389.96x |
3939

4040
## B. Forced per-update render cost
4141

4242
Rendering **forced on every single update** over **30,000** updates — i.e. the cost of one full bar redraw, throttling disabled. Lower is faster.
4343

4444
| Library | Total time | Per rendered update | vs progressbar2 |
4545
|---|--:|--:|--:|
46-
| tqdm | 331.4 ms | 11.04 us | 0.43x |
47-
| **progressbar2** | 769.4 ms | 25.64 us | baseline |
48-
| rich | 5173.6 ms | 172.45 us | 6.73x |
46+
| tqdm | 318.1 ms | 10.60 us | 0.44x |
47+
| **progressbar2** | 717.3 ms | 23.91 us | baseline |
48+
| rich | 5142.4 ms | 171.41 us | 7.17x |
4949

5050
Excluded from this panel (no per-update force-render API):
5151
- **alive-progress** — renders on a background timer thread; no per-update render API
5252
- **click** — self-throttles writes (renders only when the drawn line changes); no force-every-update API
5353

5454
## C. Cold import time
5555

56-
Wall-clock cost of importing the library in a fresh interpreter (minimum of 9 runs), with bare-interpreter startup (17 ms) subtracted. Matters for short-lived CLIs. Lower is lighter.
56+
Wall-clock cost of importing the library in a fresh interpreter (minimum of 9 runs), with bare-interpreter startup (16 ms) subtracted. Matters for short-lived CLIs. Lower is lighter.
5757

5858
| Library | Import time (net) |
5959
|---|--:|
60-
| alive-progress | 8.6 ms |
61-
| tqdm | 23.5 ms |
62-
| click | 24.1 ms |
63-
| **progressbar2** | 45.3 ms |
64-
| rich | 49.6 ms |
60+
| alive-progress | 8.3 ms |
61+
| tqdm | 21.6 ms |
62+
| click | 23.0 ms |
63+
| **progressbar2** | 46.0 ms |
64+
| rich | 47.0 ms |
6565

6666
## Takeaways
6767

68-
- **Default per-iteration overhead:** `progressbar2` is 31 ns/iter, ranking #2 of 5. `rich` is the lightest per iteration (19 ns), `click` the heaviest (1919 ns).
69-
- `rich` and `tqdm` win here because their default settings do almost no per-iteration work (counter compare / background refresh thread); `progressbar2` calls a monotonic clock and evaluates its redraw predicate on every `update()`.
70-
- **Render cost:** when a redraw actually happens, `progressbar2` draws one update in 25.6 us — 2.32x the cheapest (`tqdm`) but 6.7x cheaper than rich's full-display re-render.
68+
- **Default per-iteration overhead:** `progressbar2` is 5 ns/iter, ranking #1 of 5. `progressbar2` is the lightest per iteration (5 ns), `click` the heaviest (1885 ns).
69+
- `progressbar2` and `tqdm` win here because their default settings do almost no per-iteration work (counter compare / background refresh thread); `progressbar2` calls a monotonic clock and evaluates its redraw predicate on every `update()`.
70+
- **Render cost:** when a redraw actually happens, `progressbar2` draws one update in 23.9 us — 2.26x the cheapest (`tqdm`) but 7.2x cheaper than rich's full-display re-render.
7171
- **Why both numbers matter:** `progressbar2` caps redraws at ~20/sec by default (50 ms floor), so in practice the cheap render in B fires rarely and the per-iteration cost in A dominates real workloads.
7272
- **Import weight:** `progressbar2` is mid-pack to import; `alive-progress` is the lightest, `rich` the heaviest.
7373

benchmarks/results.json

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -20,53 +20,53 @@
2020
"term": "80x24"
2121
},
2222
"scenario_a_default_overhead": {
23-
"baseline_min_s": 0.00558699993416667,
24-
"baseline_median_s": 0.005599833093583584,
23+
"baseline_min_s": 0.005534708965569735,
24+
"baseline_median_s": 0.005606417078524828,
2525
"libs": {
2626
"progressbar2": {
27-
"total_min_s": 0.03611662518233061,
28-
"total_median_s": 0.037002416793257,
29-
"overhead_ns_per_iter": 30.52962524816394
27+
"total_min_s": 0.010368958115577698,
28+
"total_median_s": 0.010549040976911783,
29+
"overhead_ns_per_iter": 4.834249150007963
3030
},
3131
"tqdm": {
32-
"total_min_s": 0.061426167376339436,
33-
"total_median_s": 0.06293316604569554,
34-
"overhead_ns_per_iter": 55.839167442172766
32+
"total_min_s": 0.06017666729167104,
33+
"total_median_s": 0.061196332797408104,
34+
"overhead_ns_per_iter": 54.6419583261013
3535
},
3636
"rich": {
37-
"total_min_s": 0.024669166654348373,
38-
"total_median_s": 0.02482037479057908,
39-
"overhead_ns_per_iter": 19.082166720181704
37+
"total_min_s": 0.02470291731879115,
38+
"total_median_s": 0.024790583178400993,
39+
"overhead_ns_per_iter": 19.168208353221416
4040
},
4141
"alive-progress": {
42-
"total_min_s": 0.2512250836007297,
43-
"total_median_s": 0.2683616247959435,
44-
"overhead_ns_per_iter": 245.63808366656306
42+
"total_min_s": 0.25490304082632065,
43+
"total_median_s": 0.2645676671527326,
44+
"overhead_ns_per_iter": 249.3683318607509
4545
},
4646
"click": {
47-
"total_min_s": 1.9241157919168472,
48-
"total_median_s": 1.9306053328327835,
49-
"overhead_ns_per_iter": 1918.5287919826806
47+
"total_min_s": 1.89069899963215,
48+
"total_median_s": 1.907758583780378,
49+
"overhead_ns_per_iter": 1885.1642906665802
5050
}
5151
}
5252
},
5353
"scenario_b_forced_render": {
54-
"baseline_min_s": 0.0001589590683579445,
54+
"baseline_min_s": 0.00016704201698303223,
5555
"libs": {
5656
"progressbar2": {
57-
"total_min_s": 0.7693966659717262,
58-
"total_median_s": 0.7766300840303302,
59-
"per_update_us": 25.64125689677894
57+
"total_min_s": 0.7173186247237027,
58+
"total_median_s": 0.719321624841541,
59+
"per_update_us": 23.905052756890655
6060
},
6161
"tqdm": {
62-
"total_min_s": 0.33136308286339045,
63-
"total_median_s": 0.3320189160294831,
64-
"per_update_us": 11.040137459834416
62+
"total_min_s": 0.3181296670809388,
63+
"total_median_s": 0.31852462515234947,
64+
"per_update_us": 10.598754168798527
6565
},
6666
"rich": {
67-
"total_min_s": 5.17360516730696,
68-
"total_median_s": 5.191705749835819,
69-
"per_update_us": 172.4482069412867
67+
"total_min_s": 5.1424090838991106,
68+
"total_median_s": 5.166477249935269,
69+
"per_update_us": 171.40806806273758
7070
}
7171
},
7272
"excluded": {
@@ -75,27 +75,27 @@
7575
}
7676
},
7777
"scenario_c_import_time": {
78-
"interpreter_baseline_s": 0.016710625030100346,
78+
"interpreter_baseline_s": 0.015609750058501959,
7979
"libs": {
8080
"progressbar2": {
81-
"total_min_s": 0.06196920806542039,
82-
"net_ms": 45.258583035320044
81+
"total_min_s": 0.06156795937567949,
82+
"net_ms": 45.958209317177534
8383
},
8484
"tqdm": {
85-
"total_min_s": 0.04024833394214511,
86-
"net_ms": 23.537708912044764
85+
"total_min_s": 0.03719387482851744,
86+
"net_ms": 21.584124770015478
8787
},
8888
"rich": {
89-
"total_min_s": 0.06633466714993119,
90-
"net_ms": 49.62404211983085
89+
"total_min_s": 0.06259091570973396,
90+
"net_ms": 46.981165651232004
9191
},
9292
"alive-progress": {
93-
"total_min_s": 0.025270250160247087,
94-
"net_ms": 8.559625130146742
93+
"total_min_s": 0.023956042248755693,
94+
"net_ms": 8.346292190253735
9595
},
9696
"click": {
97-
"total_min_s": 0.04085325030609965,
98-
"net_ms": 24.142625275999308
97+
"total_min_s": 0.03865604242309928,
98+
"net_ms": 23.04629236459732
9999
}
100100
}
101101
}

progressbar/bar.py

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import abc
44
import contextlib
5+
import importlib
56
import itertools
67
import logging
78
import math
@@ -30,6 +31,18 @@
3031
)
3132
from .terminal import os_specific
3233

34+
try:
35+
# Optional native accelerator, shipped as the ``progressbar2[fast]`` extra
36+
# (the separate ``speedups`` package). When importable, the iterator path
37+
# uses it automatically; otherwise we fall back to the pure-Python gate.
38+
# Loaded via importlib so type checkers don't try to resolve the optional
39+
# compiled module when it is absent.
40+
_FastBarIterator = importlib.import_module(
41+
'speedups.progressbar',
42+
).FastBarIterator
43+
except Exception: # pragma: no cover - environmental (absent / ABI mismatch)
44+
_FastBarIterator = None
45+
3346
logger = logging.getLogger(__name__)
3447

3548
# float also accepts integers and longs but we don't want an explicit union
@@ -928,6 +941,20 @@ def __call__(self, iterable, max_value=None):
928941
return self
929942

930943
def __iter__(self):
944+
# Dispatch to the optional native iterator when available, else the
945+
# pure-Python generator. The native path counts in C and syncs
946+
# `value`/`previous_value` only at redraw crossings (so they lag
947+
# mid-loop, like `tqdm.n`), beating the per-iteration attribute writes
948+
# the pure-Python path pays to keep them live every iteration.
949+
if (
950+
_FastBarIterator is not None
951+
and self._iterable is not None
952+
and not os.environ.get('PROGRESSBAR_DISABLE_FASTPATH')
953+
):
954+
return _FastBarIterator(self, self._iterable)
955+
return self._iter_python()
956+
957+
def _iter_python(self):
931958
# Single generator (see issue #212): a `break`/exception in the loop
932959
# body triggers `GeneratorExit`, letting us finish and restore any
933960
# redirected streams. The integer gate keeps the common iteration to
@@ -953,6 +980,10 @@ def __iter__(self):
953980
value = self.value
954981
next_update = value
955982
update = self.update
983+
# `_gate_enabled` is set once in `start()` and never mutated during
984+
# iteration, so hoist it to a local and drop the per-iteration
985+
# attribute load on the hot path.
986+
gate_enabled = self._gate_enabled
956987
for item in iterator:
957988
value += 1
958989
# When the gate is disabled, call `update()` every iteration so
@@ -964,7 +995,7 @@ def __iter__(self):
964995
# `update()` (rather than pre-setting `self.value`) lets it
965996
# record the prior value in the public `previous_value`,
966997
# preserving its original semantics.
967-
if not self._gate_enabled or value >= next_update:
998+
if not gate_enabled or value >= next_update:
968999
update(value)
9691000
next_update = self._next_update
9701001
else:
@@ -981,6 +1012,28 @@ def __iter__(self):
9811012
self.finish(dirty=True)
9821013
raise
9831014

1015+
# --- Native accelerator protocol (used by speedups.FastBarIterator) ------
1016+
# The C iterator counts items itself and calls back here only at gate
1017+
# crossings, reusing the existing gate/redraw/calibration machinery so the
1018+
# redraw cadence is identical to `_iter_python`.
1019+
1020+
def _fast_begin(self) -> None:
1021+
"""Start the bar (draws 0%, sets `_next_update`/`_gate_enabled`)."""
1022+
if self.start_time is None:
1023+
self.start()
1024+
1025+
def _fast_tick(self, value: int) -> None:
1026+
"""Handle a redraw crossing: redraw-if-due and recompute the gate."""
1027+
self.update(value)
1028+
1029+
def _fast_end(self) -> None:
1030+
"""Finish normally (draws 100%, restores streams) on exhaustion."""
1031+
self.finish()
1032+
1033+
def _fast_end_dirty(self) -> None:
1034+
"""Finish dirty on early break/exception (restores streams)."""
1035+
self.finish(dirty=True)
1036+
9841037
def __next__(self):
9851038
value: typing.Any
9861039
try:

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,12 @@ repository = 'https://github.com/wolph/python-progressbar/'
112112
progressbar = 'progressbar.__main__:main'
113113

114114
[project.optional-dependencies]
115+
# Optional native iterator accelerator. When installed it is detected and used
116+
# automatically (the iterator path drops to ~5 ns/iter); otherwise progressbar2
117+
# falls back to the pure-Python gate. See the Performance section in README.
118+
fast = [
119+
'speedups>=2.1.0',
120+
]
115121
docs = [
116122
'sphinx>=1.8.5',
117123
'sphinx-autodoc-typehints>=1.6.0',

tests/conftest.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,18 @@ def pytest_configure(config) -> None:
2424
)
2525

2626

27+
@pytest.fixture(autouse=True)
28+
def disable_native_accelerator(monkeypatch):
29+
# The optional native accelerator (speedups.FastBarIterator) is exercised
30+
# explicitly in test_native_accelerator.py. Every other test targets the
31+
# pure-Python iterator (`_iter_python`), so force that path by default when
32+
# the compiled `speedups` package happens to be installed in the dev/bench
33+
# environment. Native tests restore it via their own monkeypatch.
34+
import progressbar.bar as bar_module
35+
36+
monkeypatch.setattr(bar_module, '_FastBarIterator', None)
37+
38+
2739
@pytest.fixture(autouse=True)
2840
def small_interval(monkeypatch, request) -> None:
2941
# Tests marked `no_freezegun` need real timing conditions (e.g. the perf

tests/test_fastpath.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -525,9 +525,11 @@ def test_shortcut_has_single_generator_layer():
525525

526526
gen = progressbar.progressbar(range(3), fd=RecordingTTY())
527527
assert isinstance(gen, types.GeneratorType)
528-
# It is the bar's own __iter__ generator, not a wrapper: compare the
529-
# generator's code object to ProgressBar.__iter__ (robust across versions).
530-
assert gen.gi_code is progressbar.ProgressBar.__iter__.__code__
528+
# It is the bar's own iterator generator, not a wrapper: compare the
529+
# generator's code object to ProgressBar._iter_python (the pure-Python
530+
# path `__iter__` dispatches to; robust across versions). The autouse
531+
# `disable_native_accelerator` fixture forces this path here.
532+
assert gen.gi_code is progressbar.ProgressBar._iter_python.__code__
531533

532534

533535
def test_env_disables_fastpath(monkeypatch):

0 commit comments

Comments
 (0)