fix: make previous_value byte-identical every iteration (airtight back-compat)

wolph · wolph · commit ab69cda4e9fb · 2026-06-22T12:38:56.000+02:00
Set previous_value in the iterator's gated-out branch too, so reads of
bar.previous_value mid-loop match the pre-gate every-iteration semantics
exactly (not just at redraws). Closes the last residue of the backward-
compatibility review concern. Costs ~7 ns/iter (now ~31 ns vs tqdm 56,
still 2nd-fastest); README + benchmark updated to the honest figure. Adds
a per-iteration previous_value assertion to the liveness test.
diff --git a/README.rst b/README.rst
@@ -77,25 +77,26 @@ Performance
 ******************************************************************************
 
 Wrapping a loop with ``progressbar2`` is cheap. On the benchmark machine
-(CPython 3.13, macOS arm64) it adds only about **24 nanoseconds per
-iteration** over a bare loop -- roughly **2.3x faster than tqdm** and within a
-few nanoseconds of ``rich``, while being far ahead of the rest:
+(CPython 3.13, macOS arm64) it adds only about **31 nanoseconds per
+iteration** over a bare loop -- roughly **1.8x faster than tqdm** and second
+only to ``rich``, while being far ahead of the rest:
 
 ================  ==================
 Library           Overhead per iter
 ================  ==================
 rich              19 ns
-progressbar2      24 ns
+progressbar2      31 ns
 tqdm              56 ns
-alive-progress    247 ns
-click             1878 ns
+alive-progress    262 ns
+click             1892 ns
 ================  ==================
 
 The per-iteration cost is dominated by deciding *whether* to redraw, not by
 drawing: ``progressbar2`` keeps an integer "next update" gate so the common
-iteration is just an increment and a compare, only entering the (rate-limited)
-redraw machinery a few times per second. Behaviour is unchanged -- the same
-widgets, the same redraw cadence.
+iteration is just an increment and a couple of cheap stores, only entering the
+(rate-limited) redraw machinery a few times per second. Behaviour is unchanged
+-- the same widgets, the same redraw cadence, and ``value``/``previous_value``
+stay byte-identical to the pre-gate implementation on every iteration.
 
 The benchmark is fully reproducible and pits ``progressbar2`` against ``tqdm``,
 ``rich``, ``alive-progress`` and ``click`` across iteration overhead, forced
diff --git a/benchmarks/chart.png b/benchmarks/chart.png
diff --git a/benchmarks/report.md b/benchmarks/report.md
@@ -1,6 +1,6 @@
 # Python progress-bar library benchmark
 
-_Generated 2026-06-22 02:41. Subject: **progressbar2** (version 4.5.0)._
+_Generated 2026-06-22 12:37. Subject: **progressbar2** (version 4.5.0)._
 
 Compares `progressbar2` against the most common alternatives across three independent dimensions. All rendered output is written to a real pseudo-terminal (pty) that is continuously drained, so every library believes it is attached to a TTY and actually draws — the comparison is apples-to-apples, not "is output suppressed when piped".
 
@@ -27,47 +27,47 @@ Compares `progressbar2` against the most common alternatives across three indepe
 
 Idiomatic "wrap my loop" call with each library's **default** settings, over **1,000,000** iterations with a trivial body. This is the real-world cost of dropping a progress bar around a fast loop. Overhead = (wrapped time − bare-loop time) / iterations. Lower is faster.
 
-Bare loop baseline: **5.52 ms** for 1,000,000 iterations.
+Bare loop baseline: **5.45 ms** for 1,000,000 iterations.
 
 | Library | Total time | Overhead/iter | vs progressbar2 |
 |---|--:|--:|--:|
-| rich | 24.5 ms | 19.0 ns | 0.80x |
-| **progressbar2** | 29.3 ms | 23.8 ns | baseline |
-| tqdm | 61.5 ms | 55.9 ns | 2.35x |
-| alive-progress | 252.6 ms | 247.1 ns | 10.38x |
-| click | 1883.9 ms | 1878.3 ns | 78.90x |
+| rich | 24.4 ms | 18.9 ns | 0.62x |
+| **progressbar2** | 36.1 ms | 30.6 ns | baseline |
+| tqdm | 61.1 ms | 55.6 ns | 1.82x |
+| alive-progress | 267.6 ms | 262.1 ns | 8.57x |
+| click | 1897.1 ms | 1891.6 ns | 61.82x |
 
 ## B. Forced per-update render cost
 
 Rendering **forced on every single update** over **30,000** updates — i.e. the cost of one full bar redraw, throttling disabled. Lower is faster.
 
 | Library | Total time | Per rendered update | vs progressbar2 |
 |---|--:|--:|--:|
-| tqdm | 328.6 ms | 10.95 us | 0.39x |
-| **progressbar2** | 843.8 ms | 28.12 us | baseline |
-| rich | 5146.9 ms | 171.56 us | 6.10x |
+| tqdm | 349.0 ms | 11.63 us | 0.43x |
+| **progressbar2** | 809.1 ms | 26.96 us | baseline |
+| rich | 5103.9 ms | 170.13 us | 6.31x |
 
 Excluded from this panel (no per-update force-render API):
 - **alive-progress** — renders on a background timer thread; no per-update render API
 - **click** — self-throttles writes (renders only when the drawn line changes); no force-every-update API
 
 ## C. Cold import time
 
-Wall-clock cost of importing the library in a fresh interpreter (minimum of 9 runs), with bare-interpreter startup (18 ms) subtracted. Matters for short-lived CLIs. Lower is lighter.
+Wall-clock cost of importing the library in a fresh interpreter (minimum of 9 runs), with bare-interpreter startup (15 ms) subtracted. Matters for short-lived CLIs. Lower is lighter.
 
 | Library | Import time (net) |
 |---|--:|
-| alive-progress | 11.0 ms |
-| tqdm | 25.6 ms |
-| click | 27.0 ms |
-| **progressbar2** | 49.8 ms |
-| rich | 53.0 ms |
+| alive-progress | 8.1 ms |
+| tqdm | 21.7 ms |
+| click | 23.4 ms |
+| **progressbar2** | 45.8 ms |
+| rich | 47.2 ms |
 
 ## Takeaways
 
-- **Default per-iteration overhead:** `progressbar2` is 24 ns/iter, ranking #2 of 5. `rich` is the lightest per iteration (19 ns), `click` the heaviest (1878 ns).
+- **Default per-iteration overhead:** `progressbar2` is 31 ns/iter, ranking #2 of 5. `rich` is the lightest per iteration (19 ns), `click` the heaviest (1892 ns).
   - `rich` and `tqdm` win here because their default settings do almost no per-iteration work (counter compare / background refresh thread); `progressbar2` calls a monotonic clock and evaluates its redraw predicate on every `update()`.
-- **Render cost:** when a redraw actually happens, `progressbar2` draws one update in 28.1 us — 2.57x the cheapest (`tqdm`) but 6.1x cheaper than rich's full-display re-render.
+- **Render cost:** when a redraw actually happens, `progressbar2` draws one update in 27.0 us — 2.32x the cheapest (`tqdm`) but 6.3x cheaper than rich's full-display re-render.
 - **Why both numbers matter:** `progressbar2` caps redraws at ~20/sec by default (50 ms floor), so in practice the cheap render in B fires rarely and the per-iteration cost in A dominates real workloads.
 - **Import weight:** `progressbar2` is mid-pack to import; `alive-progress` is the lightest, `rich` the heaviest.
 
diff --git a/benchmarks/results.json b/benchmarks/results.json
@@ -20,53 +20,53 @@
     "term": "80x24"
   },
   "scenario_a_default_overhead": {
-    "baseline_min_s": 0.0056461249478161335,
-    "baseline_median_s": 0.005713916849344969,
+    "baseline_min_s": 0.005453874822705984,
+    "baseline_median_s": 0.0054806252010166645,
     "libs": {
       "progressbar2": {
-        "total_min_s": 0.04969320772215724,
-        "total_median_s": 0.05024691578000784,
-        "overhead_ns_per_iter": 44.047082774341106
+        "total_min_s": 0.03605487523600459,
+        "total_median_s": 0.03639816725626588,
+        "overhead_ns_per_iter": 30.601000413298607
       },
       "tqdm": {
-        "total_min_s": 0.06316162506118417,
-        "total_median_s": 0.06452187523245811,
-        "overhead_ns_per_iter": 57.515500113368034
+        "total_min_s": 0.061058541759848595,
+        "total_median_s": 0.06176149984821677,
+        "overhead_ns_per_iter": 55.60466693714261
       },
       "rich": {
-        "total_min_s": 0.025369917042553425,
-        "total_median_s": 0.02551012486219406,
-        "overhead_ns_per_iter": 19.72379209473729
+        "total_min_s": 0.024352333042770624,
+        "total_median_s": 0.024487290997058153,
+        "overhead_ns_per_iter": 18.89845822006464
       },
       "alive-progress": {
-        "total_min_s": 0.2668608748354018,
-        "total_median_s": 0.2882573329843581,
-        "overhead_ns_per_iter": 261.21474988758564
+        "total_min_s": 0.2675985828973353,
+        "total_median_s": 0.2798731252551079,
+        "overhead_ns_per_iter": 262.1447080746293
       },
       "click": {
-        "total_min_s": 1.9724276666529477,
-        "total_median_s": 1.984468291979283,
-        "overhead_ns_per_iter": 1966.7815417051315
+        "total_min_s": 1.8970785411074758,
+        "total_median_s": 1.9143713326193392,
+        "overhead_ns_per_iter": 1891.6246662847698
       }
     }
   },
   "scenario_b_forced_render": {
-    "baseline_min_s": 0.00016858289018273354,
+    "baseline_min_s": 0.0001548328436911106,
     "libs": {
       "progressbar2": {
-        "total_min_s": 0.7714061671867967,
-        "total_median_s": 0.7764091249555349,
-        "per_update_us": 25.707919476553798
+        "total_min_s": 0.8090808331035078,
+        "total_median_s": 0.8101628748700023,
+        "per_update_us": 26.964200008660555
       },
       "tqdm": {
-        "total_min_s": 0.3309014579281211,
-        "total_median_s": 0.33203466702252626,
-        "per_update_us": 11.02442916793128
+        "total_min_s": 0.3489515413530171,
+        "total_median_s": 0.35359816579148173,
+        "per_update_us": 11.626556950310865
       },
       "rich": {
-        "total_min_s": 5.2165322080254555,
-        "total_median_s": 5.250087457709014,
-        "per_update_us": 173.8787875045091
+        "total_min_s": 5.103938166983426,
+        "total_median_s": 5.121730832848698,
+        "per_update_us": 170.12611113799116
       }
     },
     "excluded": {
@@ -75,27 +75,27 @@
     }
   },
   "scenario_c_import_time": {
-    "interpreter_baseline_s": 0.016857875045388937,
+    "interpreter_baseline_s": 0.014845416881144047,
     "libs": {
       "progressbar2": {
-        "total_min_s": 0.0627057496458292,
-        "net_ms": 45.847874600440264
+        "total_min_s": 0.060622042044997215,
+        "net_ms": 45.77662516385317
       },
       "tqdm": {
-        "total_min_s": 0.0390107911080122,
-        "net_ms": 22.152916062623262
+        "total_min_s": 0.03654904244467616,
+        "net_ms": 21.703625563532114
       },
       "rich": {
-        "total_min_s": 0.06456149974837899,
-        "net_ms": 47.703624702990055
+        "total_min_s": 0.06204712484031916,
+        "net_ms": 47.20170795917511
       },
       "alive-progress": {
-        "total_min_s": 0.02484762528911233,
-        "net_ms": 7.9897502437233925
+        "total_min_s": 0.02296954207122326,
+        "net_ms": 8.124125190079212
       },
       "click": {
-        "total_min_s": 0.04090962512418628,
-        "net_ms": 24.05175007879734
+        "total_min_s": 0.038272291887551546,
+        "net_ms": 23.4268750064075
       }
     }
   }
diff --git a/progressbar/bar.py b/progressbar/bar.py
@@ -968,8 +968,12 @@ def __iter__(self):
                     update(value)
                     next_update = self._next_update
                 else:
-                    # Gated out: keep bar.value live without entering the
-                    # redraw machinery (no `previous_value`/redraw change).
+                    # Gated out: advance bar.value AND previous_value (exactly
+                    # as update() would) without entering the redraw machinery,
+                    # so reads of bar.previous_value mid-loop stay identical to
+                    # the original every-iteration semantics. The gate's pixel
+                    # reference is the separate `_last_drawn_value`.
+                    self.previous_value = self.value
                     self.value = value
                 yield item
             self.finish()
diff --git a/tests/test_fastpath.py b/tests/test_fastpath.py
@@ -109,6 +109,13 @@ def test_value_is_live_during_iteration():
         # bar.value == i: value reflects items yielded so far (pre-increment),
         # so at the start of the body for item i, value is i (not i+1).
         assert bar.value == i, f'bar.value mismatch at i={i}: got {bar.value}'
+        # previous_value stays byte-identical to the pre-gate behavior on
+        # EVERY iteration (not just at redraws): the value before the current
+        # one (0 for the first item, set by start()'s forced draw).
+        expected_prev = i - 1 if i else 0
+        assert bar.previous_value == expected_prev, (
+            f'previous_value mismatch at i={i}: got {bar.previous_value}'
+        )
         last = i
     assert last == 499