Skip to content

Conversation

ArseniiPetrovich
Copy link
Member

No description provided.

ribasushi and others added 6 commits April 15, 2025 00:58
Currently various groups still rely on this, and spend hours generating
it via lotus.

Doubles as a demo of lotus-less API implementation doing same work in ~6 minutes
from a car file, or ~20 minutes from a lotus node via RPC
( vastly unoptimized, can be made still MUCH faster )

Usage:
./filexp dump-statemarketdeals --rpc-fullnode=ws://localhost:1234 > SMD.json
Processing ~70 million entries using an inefficient "swarm of goroutines" puts
exortbitant pressure on GC, especially notable in low-memory environments.

While one can not solve all of the problems (due to low-level IPLD libs being
meh), a number of reductions can help:

- switch to a static number of consumer-workers
- reuse a single static encode-buffer per worker (json.Encode does not help)
- keep as much of possible on the stack
- reuse the variables within loops where it makes sense
- pass the from-IPLD struct by ref where possible
- use a single mutex syncpoint, remove atomic incs entirely
Nearly all use cases of filexp (except for the FVM cmds) do not really
benefit from counting unique CIDs. In cases of large scale dumps this
counting becomes so expensive that it dominates the execution time.

Make the counting optional except under FVM, gated by --count-unique-cids
@ArseniiPetrovich ArseniiPetrovich self-assigned this Jul 31, 2025
@ribasushi ribasushi deleted the f05dump branch August 2, 2025 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants