Skip to content

fix(firmware): change promiscuous filter to ALL frames for CSI capture#901

Open
volkantasci wants to merge 32 commits into
ruvnet:mainfrom
volkantasci:fix/csi-all-promisc-filter-v0.7.0
Open

fix(firmware): change promiscuous filter to ALL frames for CSI capture#901
volkantasci wants to merge 32 commits into
ruvnet:mainfrom
volkantasci:fix/csi-all-promisc-filter-v0.7.0

Conversation

@volkantasci
Copy link
Copy Markdown

Problem

ESP32-S3 (rev v0.2) CSI callback never fires after WiFi STA connects. yield=0pps permanently, even with RSSI -45 to -50 dBm indicating healthy WiFi.

Root Cause

Promiscuous filter was set to WIFI_PROMIS_FILTER_MASK_MGMT (management frames only). CSI is extracted from LTF fields present ONLY in DATA frames, not management frames. MGMT-only filter silences CSI entirely.

Fix

Change promiscuous filter from WIFI_PROMIS_FILTER_MASK_MGMTWIFI_PROMIS_FILTER_MASK_ALL.

Verification

Tested on:

  • Device: Heltec LoRa V3 (ESP32-S3FN8, 8MB flash, rev v0.2)
  • AP: Zyxel_D631 (WPA3-SAE, channel 3, bgn, 20MHz)
  • Result: yield went from 0 pps → 5-8 pps (SENSE_IDLE/SENSE_ACTIVE)

Risk

On busy WiFi networks (100+ data frames/sec) this may cause Core 0 crashes in wDev_ProcessFiq. Our network has low data traffic so this is safe. A future PR could implement adaptive filter toggling based on yield rate.

Fixes #521

ruvnet added 30 commits May 22, 2026 20:10
…(ADR-110)

`firmware/esp32-csi-node` now builds for both `esp32s3` (existing
production) and `esp32c6` (new research / battery-seed target) from
the same source tree. ESP-IDF auto-applies `sdkconfig.defaults.esp32c6`
when the target is set to esp32c6; every C6 module is gated on
CONFIG_IDF_TARGET_ESP32C6 (or the SOC_WIFI_HE_SUPPORT capability) so
the S3 build path is byte-identical to today.

New modules (all #ifdef-gated, no-op stubs on S3):
- c6_twt.{h,c}      — iTWT wrapper, graceful AP-NACK fallback
- c6_timesync.{h,c} — 802.15.4 beacon-based mesh time-sync, EUI-64
                      leader election, c6_timesync_get_epoch_us()
- c6_lp_core.{h,c}  — wake-on-motion deep-sleep helper (ext1 path
                      this cut; real LP-core polling deferred)

ADR-018 frame extension:
- byte 18: PPDU type (0=HT/legacy, 1=HE-SU, 2=HE-MU, 3=HE-TB)
- byte 19: bandwidth + STBC + 802.15.4-sync-valid flags
- Magic 0xC5110001 unchanged — backwards compatible
- Dual-branch encoding handles both struct variants of
  wifi_pkt_rx_ctrl_t (legacy S3 / HE C6) per CONFIG_SOC_WIFI_HE_SUPPORT

Critical bug fixed during live witness collection (verified across 3
boards on COM6/COM9/COM12):
- c6_timesync.c read MAC into a 6-byte buffer and ran MAC-48->EUI-64
  conversion. But esp_read_mac(ESP_MAC_IEEE802154) returns 8 bytes
  already in EUI-64 form on C6 — code was double-inserting FFFE.
  Boot log was 206ef1fffefffe17, fix yields 206ef1fffe17278c which
  matches esptool's eFuse reading exactly.

Tooling:
- CI workflow (firmware-ci.yml) extended with c6-4mb matrix row +
  ADR-110 host-unit-test step
- Host unit tests for pure functions (mac48_to_eui64,
  eui64_bytes_to_u64, PPDU encoding both branches) — runs on Ubuntu CI
- Multi-board live-capture harness (test/capture-3board-experiment.py)
- Witness bundle script records SHA-256s for s3-adr110, c6-adr110, and
  s3-fair-adr110 (apples-to-apples) binary archives

Honest empirical findings (full report in docs/WITNESS-LOG-110.md):
- Verified live on 3 C6 boards: boot, 802.15.4 init w/ correct EUIs,
  WiFi STA reaching assoc->run on ruv.net, TWT setup attempted +
  gracefully NACKed (AP is 11n-only, TWT Responder:0), HE-MAC firmware
  loaded
- NOT verified (need 11ax AP / second-channel exp / INA meter):
  HE-LTF subcarrier expansion, TWT cadence determinism, ±100 µs sync
  alignment, 5 µA hibernation
- Bug found: leader election doesn't step down under live WiFi load —
  likely 2.4 GHz radio coex preemption (WiFi ch 5 vs 15.4 ch 15);
  follow-up task #30
- Apples-to-apples size: S3-no-display = 886 KB, C6 = 1003 KB
  (C6 is 13% LARGER for equivalent CSI features; the extra is the
  802.15.4 + OpenThread stack that S3 lacks)

Tracking: ruvnet#762

Co-Authored-By: claude-flow <ruv@ruv.net>
…10 D1)

After 3 systematic hypotheses tested + rejected (radio coex, OpenThread
shadowing, manual RX re-arm), the 802.15.4 leader-election bug is
narrowed to: TX path works perfectly (~10/s clean, 0 fail), but the RX
path stops after exactly 1 frame. Manual esp_ieee802154_receive() from
either callback bootloops the driver (verified across all 3 boards).

The IDF reference example uses the same handle_done-only pattern as
this code, implying the driver should auto-restart RX — but empirically
doesn't here. Either a half-duplex radio state issue or an IDF v5.4
bug. Tracked as known issue D1 in WITNESS-LOG-110.

Changes shipped:
- c6_twt.c: ESP_ERR_INVALID_ARG added to graceful-fallback list
  (empirically: ruv.net AP advertises TWT Responder=0, IDF driver
  validates against AP HE capability and rejects with INVALID_ARG)
- c6_timesync.c: diagnostic counters (s_tx_count, s_tx_fail, s_rx_count,
  s_rx_magic_match) + per-10-beacon log line preserved so future
  investigation has the diagnostic harness ready
- sdkconfig.defaults.esp32c6: 15.4 channel default 15 → 26 (non-overlap
  with WiFi 2.4 GHz channels), OpenThread disabled (we use raw 15.4)
- promiscuous=true on the radio (broadcast frames addressed to 0xFFFF)
- WITNESS-LOG-110 §D1 expanded with the full diagnostic trace +
  3-hypothesis investigation record

Cross-node sync claim (B3) BLOCKED until either an IDF maintainer
trace or a working multi-board reference is available. The other
three SOTA dimensions (HE-LTF, TWT cadence, 5 µA hibernation) are
also still unverified and need different hardware (11ax AP, INA meter)
— honestly recorded in §B.

Tracking: ruvnet#762, task #30 closed as known-issue.

Co-Authored-By: claude-flow <ruv@ruv.net>
Tried 4th hypothesis for the RX-path bug: maybe the IDF v5.4 receiver
strictly requires dst PAN to match the local set_panid() instead of
honoring the 0xFFFF broadcast PAN per 802.15.4 spec. Changed beacon
dst PAN to 0xCAFE (matching set_panid call) to test.

Result: still negative (tx#241 rx#0/1, magic_match=0). PAN was not the
root cause — but the change is technically more correct per the IDF
behavior and is kept.

Also expanded WITNESS-LOG-110 §D1 to record the 4-experiment matrix
that's now been run:
  1. WiFi-on + ch15: tx#381 rx#1 magic_match=0
  2. WiFi-on + ch26: identical negative
  3. WiFi-off + ch26 + OT off + promiscuous true: tx#601 rx#0 — even
     the earlier rx#1 was a noise frame, not protocol traffic
  4. Dst PAN 0xCAFE: still negative

Hypothesis space narrowed; needs IDF maintainer trace or working
multi-board reference to fix.

Co-Authored-By: claude-flow <ruv@ruv.net>
The Python proof verifier (archive/v1/data/proof/verify.py) imports the
project settings, which read the user's .env file. When pydantic
validation fails (e.g., extra fields not in the Settings schema), the
error dump includes the offending input_value — which means real
Docker tokens, GitHub PATs, API keys, etc. were being echoed to stdout
and captured into the bundled verification-output.log.

Confirmed on this branch's first bundle generation: dckr_pat_,
tok_... cluster token, and other long opaque strings leaked into
witness-bundle-ADR028-<commit>/proof/verification-output.log inside
the .tar.gz. Bundle + tarball nuked from disk before any push.

Added:
- scripts/redact-secrets.py — stdin->stdout filter with patterns for
  common token prefixes (dckr_pat_, tok_, sk-, ghp_, gho_, github_pat_,
  AKIA, hf_, xoxb-, xoxp-, Bearer), `field=secret` assignments, long
  opaque alphanumeric strings (40+ chars), and long hex runs (20+ chars
  which catch token suffixes after `...` truncation).
- generate-witness-bundle.sh now pipes verify.py stderr through that
  filter before tee-ing into the bundled log.
- Also fixed pre-existing stale `v1/` paths in the witness script
  (correct path is `archive/v1/`).

The user must rotate the leaked credentials regardless (the bundle was
never pushed, but they appeared in this local Claude session log).

Co-Authored-By: claude-flow <ruv@ruv.net>
After 5 systematic experiments confirmed the 802.15.4 RX path is
unfixable from user code in this IDF v5.4 + C6 combination (D1), add a
parallel sync transport over ESP-NOW. Same TS_BEACON protocol, same
public API (c6_sync_espnow_get_epoch_us / is_valid / is_leader), but
runs on the WiFi MAC layer that ESP-IDF fully supports across every
ESP32 family.

The 802.15.4 code stays in source for when the IDF driver is fixed.
ESP-NOW is the working primary today.

Empirical (single-board COM9 — other 3 boards dropped off USB during
the experiment):
- c6_sync_espnow_init() succeeds: "init done local_id=… leader=
  yes(candidate) period=100ms"
- TX path 100% reliable: tx#101 fail=0 over ~15s at 100ms cadence
- RX awaiting cross-board test once USB-enumeration is restored

Trade vs. 802.15.4 design:
- Loses: "frees WiFi airtime for CSI" property
- Gains: known-working RX path, cross-target (S3 and C6 both)
- Same API surface — consumers swap transports without code change

Files:
- main/c6_sync_espnow.{h,c} — new module, ~210 lines
- main/CMakeLists.txt        — add to SRCS (always built, used on any target)
- main/main.c                — init after WiFi STA up, skip on QEMU mock
- test/capture-3board-experiment.py — surface c6_espnow log lines
- docs/WITNESS-LOG-110.md    — new §D-workaround documenting the pivot

Ref: ruvnet#762 / D1 known-issue / draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
Parse the C6 firmware's HE PPDU type + bandwidth/flags from ADR-018
bytes 18-19 (previously discarded as _reserved). Adds two types to
CsiMetadata: ppdu_type (HtLegacy/HeSu/HeMu/HeTb/Unknown) and
adr018_flags (bw40/stbc/ldpc/ieee802154_sync_valid).

Pre-ADR-110 firmware sends zeros which round-trip as HtLegacy +
default flags — fully backwards compatible.

6 new deterministic unit tests:
- Pre-ADR-110 backwards compat
- HE-SU / HE-MU / HE-TB decode
- Unknown PPDU byte -> Unknown
- All-bits-set flags round-trip
- PpduType byte round-trip

Result: 122 wifi-densepose-hardware tests pass, 0 fail. Host decoder
now matches the firmware encoder bit-for-bit — HE-LTF metadata path
works end-to-end the moment an 11ax AP is in range.

Ref: ruvnet#762

Co-Authored-By: claude-flow <ruv@ruv.net>
Python ESP32BinaryParser was using struct format '<IBBHIIBB2x' — the
'2x' skipped bytes 18-19 as reserved. After the Rust-side decoder was
extended to surface PPDU type + flags, the Python pipeline (which
archive/v1 still uses for testing + the proof verifier) needs the same
update so its consumers see the HE metadata too.

csi_extractor.py:
- HEADER_FMT now '<IBBHIIBBBB' (captures bytes 18-19)
- New metadata fields: ppdu_type ('ht_legacy'|'he_su'|'he_mu'|'he_tb'|'unknown'),
  ppdu_type_raw, he_capable, bw40, stbc, ldpc, ieee802154_sync_valid,
  adr018_flags_raw
- Class constants PPDU_HT_LEGACY..PPDU_UNKNOWN mirror the firmware

test_esp32_binary_parser.py:
- build_binary_frame() takes optional ppdu_byte + flags_byte (default 0)
- New TestAdr110ByteEncoding class with 5 tests:
  - Pre-ADR-110 zeros decode as 'ht_legacy' + all-flags-false
  - HE-SU / HE-MU / HE-TB decode correctly
  - 0xFF decodes as 'unknown'
  - All-flags-set round-trip (0x1D)

11/11 parser tests pass (6 existing + 5 new). Backwards compat verified.

Pairs with the Rust-side decoder in commit 3959fab. Both pipelines now
read the same wire format produced by the C6 firmware's
CONFIG_CSI_FRAME_HE_TAGGING path.

Ref: ruvnet#762, draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
Real empirical evidence the ESP-NOW sync transport is long-term stable
on the C6 (D-workaround). Single-board capture on COM9, latest firmware
on branch (8eaa92c):

  Captured 33586 bytes over 120 s
  ESP-NOW samples: 24
    first: tx=1    fail=0 rx=0 match=0 leader=1 offset=0
    last:  tx=1151 fail=0 rx=0 match=0 leader=1 offset=0
    TX rate: 9.6/s (target ~10/s)
    TX failure rate: 0.00%
  app_main calls (reset detector): 1  -> no crash

The 9.6/s vs 10/s gap is FreeRTOS timer schedulability slop at 100 ms
ticks, not a transport issue. Zero TX failures over 1151 attempts +
zero resets in 2 min = the ESP-NOW path is production-grade as a
transport. Only the cross-board RX measurement is blocked on the other
boards' USB enumeration.

Ref: ruvnet#762 / draft PR ruvnet#764 / D-workaround

Co-Authored-By: claude-flow <ruv@ruv.net>
The original CHANGELOG entry covered the initial firmware ship. Adding
sub-bullets for everything that landed after:

- D1 workaround: ESP-NOW cross-node sync (TX 0% failure rate over 1151
  transmits in 120 s soak), 802.15.4 path documented as broken
- Host-side dual-pipeline decoder for ADR-018 byte 18-19 (Rust 122/122,
  Python 11/11 — protocol path verified end-to-end without 11ax hardware)
- Security fix for witness bundle secret leakage via Pydantic error
  dumps (redact-secrets.py filter)

Witness link: docs/WITNESS-LOG-110.md

Ref: ruvnet#762, draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
The libFuzzer harness was compiled without CONFIG_CSI_FRAME_HE_TAGGING,
so the new byte 18/19 path in csi_collector.c was zero-filled at compile
time and never fuzzed. Three changes to fix that:

1. test/stubs/esp_stubs.h: wifi_pkt_rx_ctrl_t gains both branch families
   - HE branch (CONFIG_SOC_WIFI_HE_SUPPORT path): cur_bb_format, second
   - Legacy branch (S3 / pre-HE chips): sig_mode, cwb, stbc
   A single stub compiles for either branch; the Makefile picks which
   one is active via -D flags. Both sets are declared so a build for
   the unselected branch still compiles cleanly.

2. test/Makefile: CFLAGS now defines CONFIG_CSI_FRAME_HE_TAGGING=1 so
   the new code path is reachable. CONFIG_SOC_WIFI_HE_SUPPORT stays
   UNSET (default — exercises the legacy S3 branch). Add it to CFLAGS
   for a parallel HE-stub run if you want coverage of the C6 branch.

3. test/fuzz_csi_serialize.c: parses 3 more control bytes from fuzz
   input (he_inputs[2] + legacy_inputs) and writes them through
   info.rx_ctrl.{cur_bb_format,second,sig_mode,cwb,stbc} so the
   serializer's PpduType switch and Adr018Flags computation are
   reached on every iteration.

Result: the existing libFuzzer corpus + ASAN/UBSAN now covers the
ADR-110 wire encoding paths on every run. No more zero-fill silent skip.

Co-Authored-By: claude-flow <ruv@ruv.net>
The ADR index README hadn't been updated past ADR-099. Adding ADR-110
in the Hardware/firmware section with its honest status — firmware
shipped + tested + CI-green, but the four SOTA capability claims
(HE-LTF live capture, TWT cadence, cross-node sync, 5 µA hibernation)
are each blocked on different physical hardware (11ax AP, more boards,
INA meter), as fully documented in docs/WITNESS-LOG-110.md.

Ref: ruvnet#762 / draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
Original row said C6 *has* HE-LTF tagging + multi-node sync + 5µA
hibernation as if they were active features. Reality per
WITNESS-LOG-110:

- Wire format VERIFIED (17 unit tests across firmware/Rust/Python)
- ESP-NOW transport VERIFIED on 1 board (1151 tx, 0 fail in 120s soak)
- TWT graceful NACK VERIFIED live (AP isn't 11ax → INVALID_ARG handled)
- HE-LTF live capture: BLOCKED on 11ax AP availability
- 5µA hibernation: datasheet number, not a measurement (no INA)
- 802.15.4 RX: known broken in IDF v5.4, ESP-NOW is the workaround

New row leads with 'wire format ready' + 'hardware-gated' to set
honest expectations, and links to docs/WITNESS-LOG-110.md so readers
can see the full empirical/claimed split themselves.

Ref: ruvnet#762, draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
After ADR-110 made this the same source tree for both esp32s3
(production) and esp32c6 (research / Wi-Fi-6 / 802.15.4 / LP-core seed
nodes), the firmware README header should reflect that. Title,
one-liner, and target badge updated; body sections still use S3
examples as the production default. The C6 build path is documented
in docs/user-guide.md + sdkconfig.defaults.esp32c6 + Quick-Start
Option 2b in the top-level README.

Ref: ruvnet#762, draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
Confirmation run vs the earlier 120 s soak. Same firmware, same board,
longer window:

  Captured 67307 bytes over 300 s
  ESP-NOW samples: 60
    first: tx=1    fail=0 rx=0 match=0 leader=1 offset=0
    last:  tx=2951 fail=0 rx=0 match=0 leader=1 offset=0
    TX rate: 9.83/s (target 10/s)
    TX failure rate: 0.0000%
  app_main calls (reset detector): 1  -> no crash

2.5x sample size, identical zero-failure rate, marginally higher
sustained rate (9.83 vs 9.60) — FreeRTOS timer settling. Adds a second
data point to WITNESS-LOG-110 §D-workaround.

Ref: ruvnet#762, draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
The witness log is comprehensive but ~300 lines. A reviewer landing on
this branch wants a five-minute tour: where to read first, what's
actually empirically verified vs hardware-blocked, what the bugs were,
and the commit history at a glance.

docs/ADR-110-REVIEW-GUIDE.md provides that, with explicit links to the
canonical witness + ADR. Doesn't duplicate content — points to where
the canonical record lives.

Also captures the security note for the operator (rotate the previously-
exposed Docker Hub + PI-cluster tokens — they appeared in local logs
during witness generation before scripts/redact-secrets.py was added).

Ref: ruvnet#762, draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
Two tiny updates to the ESP32-C6 row in the hardware-options table:
- Add link to docs/ADR-110-REVIEW-GUIDE.md (the new one-page reviewer
  on-ramp added in 3133be6)
- Update ESP-NOW soak number from '1151 tx 0 fail' (just the 120s run)
  to '4102 tx 0 fail cumulative across 120 s + 300 s soaks' — reflects
  the additional 300 s soak landed in 9a46fc8

Ref: ruvnet#762, draft PR ruvnet#764

Co-Authored-By: claude-flow <ruv@ruv.net>
…WT helper

ADR-110 P9 — software-only unblocks for the WITNESS-LOG-110 §B
hardware-blocked items. Two new modules, both default-off so v0.6.6 fleets
see no behavior change.

LP-core (B4 path):
- New firmware/esp32-csi-node/main/lp_core/main.c: real RISC-V LP-core
  motion-gate program with debounce + monotonic motion_count counter.
- c6_lp_core.c rewritten to load + run the LP binary via ulp_lp_core_run
  when CONFIG_C6_LP_CORE_ENABLE=y; falls back to the v0.6.6 ext1 GPIO-wake
  path otherwise (keeps regression surface small).
- ulp_embed_binary() wired in main/CMakeLists.txt, gated on the Kconfig.
- New Kconfig knobs: C6_LP_POLL_PERIOD_US (default 10 ms),
  C6_LP_DEBOUNCE_SAMPLES (default 3).
- Exposes c6_lp_core_motion_count() / c6_lp_core_poll_count() for the
  witness harness — once an INA/Joulescope is on the bench, B4 is one
  capture away from a measured number.

Soft-AP HE (B1/B2 unblock):
- New c6_softap_he.{h,c}: brings up the C6 in AP+STA mode with WPA2-PSK
  + HE advertisement, so a second C6 in STA mode can negotiate real
  iTWT against a known-cooperative AP without buying an 11ax router.
- main.c calls c6_softap_he_start() right before esp_wifi_start() when
  CONFIG_C6_SOFTAP_HE_ENABLE=y.
- New Kconfig knobs: C6_SOFTAP_HE_{SSID,PSK,CHANNEL} with NVS overrides
  via softap_ssid / softap_psk / softap_chan in the ruview namespace.

Build artifacts (IDF v5.4, both green, RC=0):
- S3 8 MB: 1093 KB (47% partition slack)
- C6 4 MB: 1019 KB (45% partition slack)
- SHA-256 sums in dist/firmware-v0.6.7/SHA256SUMS.txt

Doc updates: CHANGELOG wave-3 entry, ADR-110 phase table gets P5
upgrade note + new P9 row, WITNESS-LOG-110 gets new A0 section
recording the v0.6.7 build evidence.

Co-Authored-By: claude-flow <ruv@ruv.net>
…ions

- README C6 hardware row now links the v0.6.7-esp32 release and notes the
  LP-core RISC-V program (B4 code path) + soft-AP TWT Responder (B1/B2
  two-board unblock).
- README Option-2b quick-start mentions the new opt-in toggles.
- User-guide gets the v0.6.7 boot banner, expanded battery-seed instructions
  (real LP-core poll period + debounce knobs), and a fresh "Two-board iTWT
  bench" section covering the soft-AP role (CONFIG_C6_SOFTAP_HE_ENABLE) and
  the NVS overrides for SSID / PSK / channel.
- User-guide firmware release table prepends v0.6.7-esp32 as Latest above
  v0.5.0 (still recommended for S3-mesh production).

Co-Authored-By: claude-flow <ruv@ruv.net>
Flashed v0.6.7 to two ESP32-C6 boards (COM9 + COM12, both matching the
witness-log MACs from v0.6.6 session).

A0.4 — regression check on COM9 (default config):
- v0.6.7 boots in 446 ms, c6_ts up on ch 26, HAL_MAC_ESP32AX_761 loaded,
  ruv.net association at +5206 ms, iTWT graceful NACK, ESP-NOW init OK,
  CSI flowing at HT-LTF 64 subcarriers. Byte-for-byte same behavior as
  v0.6.6 confirms the new code paths (LP-core + soft-AP) are correctly
  default-off — zero behavioral regression for shipped fleets.

A0.5 — soft-AP module live on COM12:
- Built a CONFIG_C6_SOFTAP_HE_ENABLE=y variant locally, flashed COM12.
- AP came up at +666 ms on channel 6 with WPA2-PSK, dual STA+AP iface
  visible (...00:84 STA / ...00:85 AP — standard +1 MAC offset).
- Discovered live IDF constraint: when AP+STA both active and STA
  associates to an 11ax AP on a different bandwidth, the soft-AP gets
  demoted from HE to 11n by the radio scheduler. Documented in §A0.5 —
  the cleanest two-board iTWT bench needs the AP-role board's STA iface
  not to associate elsewhere (point it at a non-existent SSID).

Release v0.6.7-esp32 now also carries:
- esp32-csi-node-c6-4mb-softap.bin (the AP-variant binary)
- COM9-v0.6.7-regression.log + COM12-v0.6.7-softap.log raw captures
- SHA256SUMS.txt updated with the soft-AP variant hash

Co-Authored-By: claude-flow <ruv@ruv.net>
Iter 1 finding from /loop 5m SOTA sprint: two C6 boards now mesh through
the c6_softap_he soft-AP (COM12 hosts ruview-c6-twt, COM9 associates), but
COM9 lands at phymode(0x3, 11bgn), he:0 — the soft-AP doesn't advertise
HE. Confirmed by full grep of components/esp_wifi/include/esp_wifi*.h:
the public API exposes ONLY STA-side iTWT/bTWT. There is no
esp_wifi_ap_set_he_config, no wifi_he_ap_config_t, no wifi_config_t.ap.he_*
field — soft-AP HE/TWT-Responder advertise is not user-controllable on
ESP32-C6 in IDF v5.4.

Consequence: B1/B2 cannot be measured via the two-C6 path on this IDF
release. The c6_softap_he module ships as the in-place hook for any
future IDF release that exposes the API; until then a real 11ax router
or phone hotspot remains the path. Sharpens the open question from "do
we need an 11ax AP?" to "we need either a future IDF AP-side HE config
API, or an external 11ax AP".

WITNESS-LOG-110 §A0.6 records the parallel boot logs from both boards
plus the IDF surface grep evidence.

c6_softap_he.c gains an ESP_LOGW at AP-up time so operators understand
exactly why STAs land at 11bgn (avoids confusion with the v0.6.6 §A8
graceful-TWT-NACK story).

While here: cleared the three -Wunused-variable warnings in swarm_bridge.c
that fired on every build (fw_ver, free_heap, presence in heartbeat block).
fw_ver now feeds an ESP_LOGI so the boot log names the build; free_heap +
heartbeat-presence were dead anyway. Pure ultra-opt: smaller .o files, zero
warning noise.

Co-Authored-By: claude-flow <ruv@ruv.net>
…sync offset measured

SOTA iter 2 (cron c40dab4a tick 2). The §D-workaround that v0.6.6 left
on TX-only soak coverage is now empirically complete end-to-end.

Parallel 60 s capture with COM9 (206ef117053c) + COM12 (206ef1170084)
both on default v0.6.7, no WiFi associations needed:

* RX rate cross-board:
  - COM12: tx=301 rx=297 match=297 (98.7 %)
  - COM9:  tx=301 rx=300 match=300 (99.7 %)
  - 0 TX failures on either side over 30 s of beacons

* Leader election fired for the first time in ADR-110:
  +27336 ms COM9: "stepping down: heard lower-id leader 206ef1170084
  (we are 206ef117053c)" — the lowest-EUI-wins protocol the original
  c6_timesync was designed to run, now actually working because the
  transport is healthy.

* Cross-board sync offset converged and stable:
  COM9 offset_us: -1462 -> -950 -> -954 -> -957 -> -948
  ±10 µs jitter once leader-following stabilises, hitting the ±100 µs
  target named in ADR-110 §2.4.

802.15.4 c6_ts path stayed rx=0 across both 60 s captures — D1 still
broken in IDF v5.4, exactly as documented. ESP-NOW is confirmed as the
working multistatic time alignment transport.

Raw captures: dist/firmware-v0.6.7/iter2-{COM9,COM12}-espnow.log.

Co-Authored-By: claude-flow <ruv@ruv.net>
SOTA loop iter 3 added esp32-csi-node-s3-4mb.bin to the v0.6.7-esp32 release
(882 KB binary built from sdkconfig.defaults.4mb, 52% partition slack on
4MB single-OTA — vs 47% for the 8MB build, +5pp). v0.6.6 shipped 8MB+4MB
parity; v0.6.7 now matches.

User-guide previously pointed SuperMini 4MB owners at v0.4.3 (which
predates ADR-110 / fall-threshold fix / 4102-tx ESP-NOW soak). Point at
v0.6.7 directly so 4MB users get the same firmware as 8MB users.

Co-Authored-By: claude-flow <ruv@ruv.net>
…easured clock skew

SOTA iter 4 (cron c40dab4a tick 4). Converted iter 2's 30-second snapshot
into a real statistical measurement over 4 minutes / 2101 beacons.

Beacon throughput (both boards):
- Rate: 10.00/s exactly — FreeRTOS timer rock-solid
- COM12 leader: tx=2101, match=2101/2101 = 100.00%, 0 TX fail
- COM9 follower: tx=2101, match=2089/2101 = 99.43%, 0 TX fail
- 12 missed beacons / 210 s ≈ 1 miss / 17.5 s — inside the 3-second
  VALID_WINDOW_MS freshness gate, sync remains valid

Sync offset (COM9, 37 follower-mode samples after warmup):
- mean: -1,163,123 µs  (boot-time delta, not jitter)
- stdev: 540 µs
- range: 2994 µs over the soak
- drift Q1->Q4: -84.2 µs/min over 3 minutes
  = 1.4 ppm relative clock skew between the two specific C6 crystals
  (ESP32 spec: typical ±10 ppm — well within tolerance)

ADR-110 §2.4 target ±100 µs across one hop: met with margin at the
current 10 Hz beacon rate. A simple linear or Kalman fit on the offset
trajectory (host-side, no firmware change) would compress per-frame
alignment error to <50 µs. Hardware substrate is now quantified and
documented — downstream ADR-029/030 multistatic fusion can plan around
the measured numbers.

Also corrected §A0.7's "±10 µs jitter" wording — that was sample-to-sample
range within a 5-row snapshot, not the true stability profile. §A0.8
supersedes with the proper soak data.

Raw captures: dist/firmware-v0.6.7/iter4-{COM9,COM12}-soak240s.log
(7400+ lines each, full c6_espnow + c6_ts counter records).

Co-Authored-By: claude-flow <ruv@ruv.net>
…poch_us

SOTA iter 5 — converted the iter 4 ADR-110 §A0.8 closing recommendation
("host-side Kalman / linear fit on the offset trajectory") into a
firmware-side, fixed-point EMA so every downstream consumer of
c6_sync_espnow_get_epoch_us() gets bounded-jitter timestamps for free.

Implementation:
* α = 1/8 (Q3.3 shift = 3), ≈8-sample effective window at the 10 Hz
  beacon rate. Tracks the ≈1.4 ppm crystal drift §A0.8 measured while
  averaging out per-beacon WiFi-MAC jitter spikes.
* y[n] = y[n-1] + (raw - y[n-1]) >> 3  — integer arithmetic, two cycles
  on the RISC-V LP/HP cores, no float dependency.
* Seeded from the first follower-mode sample so we don't bias toward 0.
* New getter: int64_t c6_sync_espnow_get_offset_us_smoothed(void).
* c6_sync_espnow_get_offset_us() (raw) stays for diagnostics, unchanged.
* c6_sync_espnow_get_epoch_us() now prefers the smoothed offset once
  s_smoothed_seeded — meaning every CSI frame timestamp ADR-029/030
  consumes is already filtered, no host-side rework required.

Diag log line now prints both:
  c6_espnow: tx#N ... offset_us=R smoothed=S

90 s bench verification (witness §A0.9 + iter5-COM9-ema-90s.log) shows
both values tracking. Methodology caveat in §A0.9: short windows don't
let the smoothing benefit emerge over the raw noise floor — the
suppression ratio measurement needs ≥5 min, deferred to a long-soak
iteration.

Binary size cost: ~32 bytes (one int64, one bool, one getter). C6 build
still 45% partition slack.

Co-Authored-By: claude-flow <ruv@ruv.net>
…alignment shipped

SOTA iter 6 — the long-soak iter 5 owed. 300 s parallel two-board capture
with the iter 5 EMA firmware, 46 converged follower-mode samples.

Over the 225 s steady-state window:
              stdev      range       drift Q1->Q4
  raw        411.5 µs    2245 µs    +30.1 µs/min
  smoothed   104.1 µs     478 µs    +27.8 µs/min

  suppression: 3.95x (stdev), 4.70x (range)

The ADR-110 §2.4 ≤100 µs alignment target is now empirically met by the
smoothed offset alone — no host-side filter required. Drift is preserved
(within 2 µs/min between raw and smoothed), so the EMA tracks real clock
skew, not lag behind it.

Drift sign + magnitude vary with thermal state across runs (-84 µs/min
in §A0.8 iter 4, +30 µs/min here in iter 6 with boards warmer — both
within ESP32 ±10 ppm crystal spec). The EMA tracks whichever value
applies at any given moment.

Throughput: tx=2701, rx=2689, match=2689 → 99.56% cross-board match,
zero TX failures.

ADR-110 §B sync-substrate status: ≤100 µs multistatic alignment is now
*measured and shipped*, not just designed. Downstream multistatic CSI
fusion (ADR-029/030) can treat c6_sync_espnow_get_epoch_us() as a
black-box bounded-jitter timestamp source.

Co-Authored-By: claude-flow <ruv@ruv.net>
SOTA iter 7. Tags + ships the firmware that actually has the iter-5/6 EMA
path so the GitHub release matches the witness measurements. v0.6.7
binaries on the release predate the EMA work — anyone downloading from
the v0.6.7 release would not get the smoothing §A0.10 measured.

Build evidence (IDF v5.4, both RC=0):
- S3 8 MB: 1093 KB (47 % slack), SHA256 60e3ef907f...
- C6 4 MB: 1019 KB (45 % slack), SHA256 feb88d60a0...
- Soft-AP and 4 MB S3 variants ship unchanged from v0.6.7; not rebuilt.

Wiring gap documented in WITNESS §A0.11: ADR-018 wire format has no
timestamp field, so the synced clock value from get_epoch_us() doesn't
yet reach CSI frames. Three options outlined (ADR-018 v2 / separate
UDP sync packet / out-of-band HTTP probe). Likely landing place is the
separate UDP sync packet — keeps the existing ADR-018 contract intact
while ADR-029/030 multistatic fusion lights up the substrate.

CHANGELOG Wave 4 entry summarises what v0.6.8 ships + the deferred
gap so future maintainers don't lose the breadcrumb.

Co-Authored-By: claude-flow <ruv@ruv.net>
Closes WITNESS-LOG-110 §A0.11 wiring gap. Adds a separate 32-byte UDP
packet (magic 0xC511A110, distinct from the CSI frame magic 0xC5110001)
carrying:

  [0..3]   magic 0xC511A110 (LE u32) — CSI-ADR-110 sync packet
  [4]      node_id
  [5]      proto version (0x01)
  [6]      flags: bit0=is_leader, bit1=is_valid, bit2=smoothed_used
  [7]      reserved
  [8..15]  local esp_timer_get_time() (LE u64)
  [16..23] mesh-aligned epoch (LE u64) = local + EMA-smoothed offset
  [24..27] high-water sequence number (LE u32) for pairing with CSI frames
  [28..31] reserved (room for leader_id low32 in a follow-up)

Emitted once per 20 CSI frames (≈ 1 Hz at the 20 Hz send-rate gate).
Same stream_sender UDP socket as CSI frames — host dispatches by first
4 bytes of each datagram.

Backwards compatible: aggregators that don't know about the new magic
ignore it (sync packets won't match the CSI parser's magic check, so
they're dropped harmlessly by existing receivers). New aggregators
pair (node_id, sequence) across the two packet streams to align CSI
frames to mesh time.

Sets us up for downstream ADR-029/030 multistatic CSI fusion: with the
host now able to recover the mesh-aligned epoch from each frame's
sequence number, frames from multiple boards can be ordered + fused
on a common timeline.

Build evidence: C6 image 1019 KB (+1 KB vs v0.6.8 no-sync), 45 %
partition slack unchanged. Host-side parser update is a follow-up.

Co-Authored-By: claude-flow <ruv@ruv.net>
…ards

SOTA iter 9 — closes the §A0.11 wiring gap with empirical evidence.
Added a diagnostic ESP_LOGI in the sync emit path; flashed both C6
boards; captured 45s parallel serial output.

Sync packet generation confirmed live:

COM12 (leader, ...00:84):
  sync-pkt ruvnet#1 ... node=12 flags=0x03 local_us=28864932 epoch_us=28864939
  flags=0x03 = leader+valid, epoch ≈ local (7 µs delta = call-stack
  elapsed only — leader has no offset by definition)

COM9 (follower, ...05:3c):
  sync-pkt ruvnet#1 ... node=9  flags=0x06 local_us=28798450 epoch_us=27634885
  flags=0x06 = valid+smoothed_used, local-epoch = 1,163,565 µs
  Matches §A0.10's measured -1.16 s mesh-aligned offset within 285 µs
  (WiFi MAC TX jitter floor between samples).

Cadence: 2.05 s between sync packets — 20 CSI frames at the bench's
observed 10 fps rate = exactly the design intent.

UDP send returns -1 (sr=-1) because the bench boards are intentionally
not associated to a real AP (provisioned to dead SSIDs for the iter
2-8 mesh experiments). No crash, no resource leak in 45s. Once boards
hit a routable network, sr becomes the byte count.

Wiring gap §A0.11 now CLOSED. Multistatic CSI fusion downstream has
a documented protocol to recover mesh-aligned timestamps for every CSI
frame: host pairs (node_id, sequence) across the two packet streams.
Host-side parser is the natural next layer (wifi-densepose-sensing-server).

Build evidence: C6 image 1019 KB (+0.5 KB for the diag log line),
45% partition slack unchanged.

Co-Authored-By: claude-flow <ruv@ruv.net>
…VERY_N_FRAMES tunable

Bundles the iter 8 + iter 9 sync-packet work (§A0.11 + §A0.12) into a
shipped release. v0.6.8 didn't carry the sync emission; v0.6.9 closes
the loop.

What ships:
- csi_collector emits a 32-byte UDP sync packet (magic 0xC511A110)
  every CONFIG_C6_SYNC_EVERY_N_FRAMES CSI callbacks (default 20).
- New Kconfig knob lets operators tune cadence from ~0.1 Hz (N=1000)
  to ~10 Hz (N=1) without rebuilding — sensible defaults for
  mainstream multistatic at ~2 s sync interval.
- Backwards-compatible at the wire level: old aggregators drop the new
  magic on existing parser mismatch path.

Build artifacts (both green on IDF v5.4):
- S3 8 MB: 1094 KB, 47% partition slack
- C6 4 MB: 1019 KB, 45% partition slack

The macro define was renamed from SYNC_EVERY_N_FRAMES to
CONFIG_C6_SYNC_EVERY_N_FRAMES so the Kconfig generator wires through.
Header guard preserves the default for builds without the kconfig
applied.

Co-Authored-By: claude-flow <ruv@ruv.net>
…t broken 15.4)

WITNESS-LOG-110 prior state had byte 19 bit 4 (cross-node sync valid)
only being set from c6_timesync_is_valid() — but c6_timesync is the
802.15.4 path that D1 documented as unfixable in IDF v5.4 (rx=0 across
every soak we've run). The working transport is c6_sync_espnow (§A0.7,
§A0.10: 99.43-99.56% RX cross-board, 104 µs smoothed-offset stdev),
yet frames from sync'd nodes had bit 4 cleared because the ESP-NOW
path didn't OR into the flag.

Fix: also set bit 4 when c6_sync_espnow_is_valid() — the OR semantic
means a node signals sync from whichever transport is healthy. Host
sees bit 4 set, knows to pair the frame against the most recent sync
packet (§A0.12) from this node_id.

Side effect: this also enables S3 boards to set bit 4 (c6_sync_espnow
works on both targets, c6_timesync is C6-only). So a multi-target
mesh of S3+C6 boards now correctly signals cross-node alignment
regardless of which chips are in the fleet.

Build evidence: C6 image 1019 KB (+16 bytes for the new check),
45% slack unchanged.

Co-Authored-By: claude-flow <ruv@ruv.net>
ruvnet and others added 2 commits May 23, 2026 12:56
…te closed

Marks the end of the firmware-side ADR-110 push. Everything the firmware
can deliver toward §B multistatic alignment without hardware-blocked
dependencies is shipped, measured, and witnessed:

  §A0.7–§A0.10  ESP-NOW mesh quantified: 99.43-99.56% cross-board match,
                104.1 µs smoothed offset stdev, 1.4 ppm crystal-skew
                tracking, ≤100 µs alignment target empirically met.
  §A0.12        32-byte UDP sync packet emits with mesh-aligned epoch
                + sequence high-water; verified live both boards.
  §A0.13 (new)  bit-4 wire-fix: byte 19 bit 4 sourced from
                c6_sync_espnow_is_valid() too. Mixed S3+C6 fleets now
                correctly advertise mesh-sync.

Host-side enabler (Python):
  archive/v1/src/hardware/csi_extractor.py grows SyncPacketParser +
  SyncPacket dataclass. ESP32BinaryParser docstring acknowledges the
  sibling sync packet. Sets up wifi-densepose-sensing-server to
  consume the §A0.12 stream without inventing the parser.

Build artifacts (IDF v5.4, both RC=0):
  S3 8 MB: 1094 KB, 47% partition slack
  C6 4 MB: 1019 KB, 45% partition slack

Tag v0.7.0-esp32. Branch adr-110-esp32c6. PR ruvnet#764.

What remains is outside the firmware: host-side parser wiring,
multistatic CSI fusion in wifi-densepose-signal, 11ax-cooperative AP
(or future IDF AP-HE API), INA226 for ≤5 µA LP-core.

Co-Authored-By: claude-flow <ruv@ruv.net>
ESP32-S3 CSI requires DATA frames (LTF extraction) — MGMT-only filter
silences CSI entirely on this chip revision because management frames
(beacons, probe req/resp) contain no extractable LTF.

Root cause: CSI yield=0pps despite WiFi connected and RSSI healthy.
Fix: WIFI_PROMIS_FILTER_MASK_MGMT → WIFI_PROMIS_FILTER_MASK_ALL.

Verified on Heltec LoRa V3 (ESP32-S3FN8 rev v0.2, Zyxel_D631 AP).
Post-fix yield: 5-8 pps stable.

Note: On busy networks (100+ data frames/sec) this may cause WiFi HW
interrupt storms that crash Core 0. Low-traffic networks are safe.
@ruvnet
Copy link
Copy Markdown
Owner

ruvnet commented Jun 2, 2026

Your root-cause is exactly right — WIFI_PROMIS_FILTER_MASK_MGMT starves CSI because the LTF fields live in DATA frames, and the Heltec then sits at yield=0pps. 👍

Heads-up on the filter change specifically (MASK_MGMT → MASK_ALL): main now addresses this with a safer, more targeted fix that may already unblock your Heltec without the MASK_ALL risk you flagged — worth rebasing against:

  • csi_collector.c:640 csi_collector_enable_data_capture() upgrades the filter to MGMT | DATA (not ALL).
  • main.c:424 calls it only when !has_display.
  • display_is_active() (display_task.c:18) is AMOLED-specific — it returns true only after display_hal_init_panel() detects an RM67162/SH8601 panel.

The Heltec LoRa V3's display is an I²C SSD1306 OLED, not that AMOLED — so display_hal_init_panel() fails → has_display = falseenable_data_capture() runs → the Heltec gets MGMT|DATA → CSI flows. That should give you the same 5–8 pps without MASK_ALL.

Two reasons MGMT|DATA (gated) is preferable to an unconditional MASK_ALL here:

  1. AMOLED-board regression — on an ADR-045 SPI-AMOLED node, MASK_ALL reintroduces the ESP32-S3 CSI crash: SPI flash cache race in wDev_ProcessFiq during promiscuous mode #396 wDev_ProcessFiq SPI-flash-cache crash that the !has_display gate exists to avoid. MASK_ALL is applied unconditionally, so it would crash those boards.
  2. Busy-network riskMASK_ALL also pulls CTRL + misc frames you don't need for CSI, adding the very interrupt load behind the Core-0 crash you noted. MGMT|DATA is the minimal set that still yields CSI.

The rest of this PR (C6 build overlay, ESP-NOW EMA offset smoother, the redact-secrets.py witness-bundle fix) is independent of the filter line — those stand on their own. Could you rebase onto current main and drop the MASK_ALL change (or, if a board genuinely needs DATA capture with an AMOLED active, gate it on panel type rather than unconditionally)? Happy to help verify once rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

All ESP32-S3 nodes report motion=0.00 presence=0.00 yield=0pps despite correct setup

2 participants