feat: Add "semble savings" command by Pringled · Pull Request #76 · MinishLab/semble

Pringled · 2026-05-07T15:47:06Z

This PR adds semble savings, a CLI command that tracks and displays token savings across all searches. Stats are recorded automatically to ~/.semble/savings.jsonl on every search. There's also a verbose output that shows the calls per call type but it's not that interesting (yet) since we only have search and find_related atm.

Example output 💅 ✨:

  Semble Token Savings
  ════════════════════════════════════════════════════════════════
  Period        Calls   Savings
  ────────────────────────────────────────────────────────────────
  Today         42      [███████████████░]  ~58.4k tokens (95%)
  Last 7 days   287     [██████████████░░]  ~312.4k tokens (90%)
  All time      1.4k    [██████████████░░]  ~1.2M tokens (89%)

…ved-tokens-logging

codecov · 2026-05-07T15:50:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines	Coverage Δ
src/semble/cli.py	`100.00% <100.00%> (ø)`
src/semble/index/index.py	`100.00% <100.00%> (ø)`
src/semble/stats.py	`100.00% <100.00%> (ø)`
src/semble/types.py	`100.00% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stephantul

I think most of the code here is overly defensive, lots of guards and checks. But we control the data sources on both ends, so there is likely no need to ever check most of the things. Specifically, I think the gets with defaults of 0 actually hurt rather than help, because they obscure other bugs, and can lead to silent over- or underestimations of the stats.

stephantul · 2026-05-07T16:47:05Z

        )

        index = SembleIndex(model, bm25, vicinity, chunks)
+        index._file_sizes = SembleIndex._compute_file_sizes(path, chunks)


this can be self._compute_file_sizes. I would not call this here and just make it part of the __init__, or, if it is fast enough, just make it a property. There's also afaik no need to make this a staticmethod, doing

index.recompute_file_sizes()

is completely fine. I realize you do need the root, but the root could be part of the SembleIndex.

stephantul · 2026-05-07T16:48:09Z

        results = search_semantic(target.content, self.model, self._semantic_index, self.chunks, top_k + 1, selector)
-        return [r for r in results if r.chunk != target][:top_k]
+        results = [r for r in results if r.chunk != target][:top_k]
+        if self._file_sizes:


I don't think this ever needs to be checked. It's probably just better to populate file_sizes here if it doesn't exist already.

stephantul · 2026-05-08T06:27:14Z

+    """Save stats about a search or find_related call to the stats file."""
+    try:
+        snippet_chars = sum(len(result.chunk.content) for result in results)
+        file_chars = sum(file_sizes.get(path, 0) for path in {result.chunk.file_path for result in results})


It is probably better to disregard chunks for files for which we don't have size information. In fact, I think not having a specific file here points to some other failure.

stephantul · 2026-05-08T06:30:15Z

+        file_chars = sum(file_sizes.get(path, 0) for path in {result.chunk.file_path for result in results})
+
+        record = {
+            "ts": datetime.now(timezone.utc).isoformat(),


don't write iso format, just dump the timestamp.

stephantul · 2026-05-08T06:30:43Z

+                in_today = record_date == today
+                in_last_7 = record_date > seven_days_ago
+            except ValueError:
+                in_today = in_last_7 = False  # unparseable timestamp: count in All time only


how could this happen? AFAIK we write this ourselves. In any case, you should not parse a timestamp to a date and then reparse it again. Just use a timestamp

stephantul · 2026-05-08T06:31:37Z

+    ]
+    for label, bucket in summary.buckets.items():
+        saved_chars = max(0, bucket.file_chars - bucket.snippet_chars)
+        saved_tokens = saved_chars // 4  # standard ~4 chars/token approximation


stephantul · 2026-05-08T06:33:07Z

+            try:
+                record = json.loads(line)
+            except json.JSONDecodeError:
+                continue


maybe warn here, we don't expect this to happen.

stephantul · 2026-05-08T06:33:33Z

+                continue
+            snippet_chars = record.get("snippet_chars", 0)
+            file_chars = record.get("file_chars", 0)
+            call_type = record.get("call", "search")


when would we expect any of these to be missing? We write these ourselves.

stephantul · 2026-05-08T06:36:16Z

-        raise ValueError(f"Unknown search mode: {mode!r}")
+        else:
+            raise ValueError(f"Unknown search mode: {mode!r}")
+        if self._file_sizes:


Pringled added 15 commits May 4, 2026 15:19

Add semble stats command and saved token tracking

1431444

Merge branch 'main' of https://github.com/Pringled/semble into add-sa…

e529ae9

…ved-tokens-logging

Improved semble savings

ac7c3b7

Pull main

7b18560

Updated savings

93ff6b6

Update tests

688533f

Improve code quality

12963ab

Move logic over to stats.py

cdee40b

Improve code quality

9e4d447

Resolve conflicts

1e3ea12

Change error

eec1374

Update docs

0b4034a

Update docs

1d30641

Simplify tests

fdbcd0a

Update docs

7a2b438

Pringled requested a review from stephantul May 7, 2026 15:47

Pringled added 4 commits May 7, 2026 17:56

Rename functions and add CallType type

661dbbc

Only record savings when file_sizes are available

5b52215

Update docstring:

f3d935f

Update docstring:

1737c4c

stephantul requested changes May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add "semble savings" command#76

feat: Add "semble savings" command#76
Pringled wants to merge 19 commits intomainfrom
add-saved-tokens-logging

Pringled commented May 7, 2026

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

stephantul left a comment

Uh oh!

stephantul May 7, 2026

Uh oh!

stephantul May 7, 2026

Uh oh!

stephantul May 8, 2026

Uh oh!

stephantul May 8, 2026

Uh oh!

stephantul May 8, 2026

Uh oh!

stephantul May 8, 2026

Uh oh!

stephantul May 8, 2026

Uh oh!

stephantul May 8, 2026

Uh oh!

stephantul May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Pringled commented May 7, 2026

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stephantul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 7, 2026 •

edited

Loading