Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,25 @@ progressively the deeper you go. Existing package guides include:

## Code style

All Go files must be `gofmt`-compliant. After modifying any `.go` file, run:
All Go files must be both `gofmt`- and `goimports`-compliant (`.golangci.yml`
enables the `gofmt` and `goimports` formatters). After modifying **any** `.go`
file, run **both** tools on **every** file you touched — not just the ones you
think changed formatting:

```bash
gofmt -s -w <file>
gofmt -s -w <file>...
goimports -w <file>... # groups/orders imports; catches the goimports linter
```

Or verify the whole tree (prints nothing when everything is clean):
`goimports` is required in addition to `gofmt`: `gofmt` alone does not separate
the stdlib import group from third-party imports, so a `gofmt`-clean file can
still fail the `goimports` linter.

Verify the whole tree (each prints nothing when everything is clean):

```bash
gofmt -s -l .
goimports -l .
```

## Lint, build & test
Expand Down
40 changes: 40 additions & 0 deletions app/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"crypto/sha256"
"fmt"
"math"
"math/big"
"time"

Expand All @@ -13,6 +14,7 @@ import (
otelmetrics "go.opentelemetry.io/otel/metric"

"github.com/sei-protocol/sei-chain/app/legacyabci"
"github.com/sei-protocol/sei-chain/app/migration"
"github.com/sei-protocol/sei-chain/sei-cosmos/tasks"
"github.com/sei-protocol/sei-chain/sei-cosmos/telemetry"
sdk "github.com/sei-protocol/sei-chain/sei-cosmos/types"
Expand Down Expand Up @@ -53,12 +55,50 @@ func (app *App) BeginBlock(
if app.HardForkManager.TargetHeightReached(ctx) {
app.HardForkManager.ExecuteForTargetHeight(ctx)
}
app.applyMigrationBatchSize(ctx)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocker] applyMigrationBatchSize writes to committed state in BeginBlock: when the param is unset it calls subspace.Set(...), adding a key to the x/params store and changing the AppHash at the first block the new binary runs. That is state-machine-breaking and would diverge nodes on a rolling/uncoordinated upgrade, which seems at odds with the PR's non-app-hash-breaking label. Consider seeding the param via an upgrade handler / InitGenesis instead of lazily here, or confirm this ships at a coordinated upgrade height.

legacyabci.BeginBlock(ctx, height, votes, byzantineValidators, app.BeginBlockKeepers)
return abci.ResponseBeginBlock{
Events: sdk.MarkEventsToIndex(ctx.EventManager().ABCIEvents(), app.IndexEvents),
}
}

// applyMigrationBatchSize paces the SC store's background data migration at the network-agreed rate.
// The NumKeysToMigratePerBlock gov param is read from chain state so every node
// applies the same value each block; a per-node rate would diverge the
// AppHash. 0 (the default until a gov proposal raises it) leaves the migration
// paused; it is the sole source of the rate (there is no node-local fallback).
func (app *App) applyMigrationBatchSize(ctx sdk.Context) {
if app.rootStore == nil {
return
}
numKeys := migration.DefaultNumKeysToMigratePerBlock
if subspace, ok := app.ParamsKeeper.GetSubspace(migration.SubspaceName); ok {
// The migration subspace has no owning module to seed it in InitGenesis,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] The lazy seed correctly handles a fresh/never-set param deterministically, but because this subspace has no module ExportGenesis (x/params only exports Fees/CosmosGas params), the value is lost across a seid export/import: a mid-migration rate resets to the 0 default here and the drain pauses until governance re-raises it. Consider adding genesis import/export plumbing for this subspace, or documenting the caveat.

// so lazily persist the default the first time we see it unset. This is
// deterministic across nodes (every node runs BeginBlock identically) and
// makes the param visible to gov: ParameterChangeProposal submission only
// accepts a change when subspace.Has reports the key is already stored.
if !subspace.Has(ctx, migration.KeyNumKeysToMigratePerBlock) {
subspace.Set(ctx, migration.KeyNumKeysToMigratePerBlock, migration.DefaultNumKeysToMigratePerBlock)
}
subspace.GetIfExists(ctx, migration.KeyNumKeysToMigratePerBlock, &numKeys)
}
if numKeys > uint64(math.MaxInt64) {
numKeys = uint64(math.MaxInt64)
}
if err := app.rootStore.SetMigrationBatchSize(int(numKeys)); err != nil {
// Never panic on the migration-rate update: log and continue. AppHash
// verification is the safety net. If the rate/mode update fails on only
// some nodes, those nodes' AppHash diverges and the normal AppHash
// comparison halts them at the next block — no proactive panic needed.
// If it fails on every node, all stay in the same (old) mode with an
// identical AppHash, so the chain keeps moving and the level-triggered
// trigger re-fires on a later block. Panicking here would needlessly
// halt the whole chain in that all-fail case.
logger.Error("failed to set SC migration batch size; continuing", "err", err)
}
}

func (app *App) MidBlock(ctx sdk.Context, height int64) []abci.Event {
_, span := app.GetBaseApp().TracingInfo.StartWithContext("MidBlock", ctx.TraceSpanContext())
defer span.End()
Expand Down
124 changes: 124 additions & 0 deletions app/abci_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
package app

import (
"context"
"math"
"testing"
"time"

"github.com/sei-protocol/sei-chain/app/migration"
abci "github.com/sei-protocol/sei-chain/sei-tendermint/abci/types"
tmproto "github.com/sei-protocol/sei-chain/sei-tendermint/proto/tendermint/types"
"github.com/stretchr/testify/require"
)

// TestMigrationSubspaceRegistered verifies the generic "migration" params
// subspace is wired with its key table so governance can edit
// NumKeysToMigratePerBlock via a ParameterChangeProposal.
func TestMigrationSubspaceRegistered(t *testing.T) {
a := Setup(t, false, false, false)
subspace, ok := a.ParamsKeeper.GetSubspace(migration.SubspaceName)
require.True(t, ok, "migration subspace must be registered")
require.True(t, subspace.HasKeyTable(), "migration subspace must have a key table")

ctx := a.NewContext(false, tmproto.Header{Height: 1, ChainID: "sei-test", Time: time.Now()})
subspace.Set(ctx, migration.KeyNumKeysToMigratePerBlock, uint64(123))
var got uint64
subspace.GetIfExists(ctx, migration.KeyNumKeysToMigratePerBlock, &got)
require.Equal(t, uint64(123), got)
}

// TestApplyMigrationBatchSize covers the BeginBlock push: the gov param is
// read from chain state and forwarded into the SC commit store.
func TestApplyMigrationBatchSize(t *testing.T) {
a := Setup(t, false, false, false)
ctx := a.NewContext(false, tmproto.Header{Height: 1, ChainID: "sei-test", Time: time.Now()})

subspace, ok := a.ParamsKeeper.GetSubspace(migration.SubspaceName)
require.True(t, ok)

// Unset param: the store receives the default (0 = paused).
a.applyMigrationBatchSize(ctx)
got, ok := a.rootStore.GetMigrationBatchSize()
require.True(t, ok, "SC store should track a migration batch size")
require.Equal(t, 0, got)

// Governance raises the rate: BeginBlock forwards the new value.
subspace.Set(ctx, migration.KeyNumKeysToMigratePerBlock, uint64(500))
a.applyMigrationBatchSize(ctx)
got, _ = a.rootStore.GetMigrationBatchSize()
require.Equal(t, 500, got)

// Out-of-int64-range values are clamped to MaxInt64 (defensive; gov
// validation only type-checks).
subspace.Set(ctx, migration.KeyNumKeysToMigratePerBlock, uint64(math.MaxUint64))
a.applyMigrationBatchSize(ctx)
got, _ = a.rootStore.GetMigrationBatchSize()
require.Equal(t, math.MaxInt64, got)
}

// TestBeginBlockAppliesMigrationBatchSize exercises the full BeginBlock path
// (not the helper in isolation): it mimics a governance ParameterChangeProposal
// having set NumKeysToMigratePerBlock, then runs app.BeginBlock and asserts the
// new rate landed in the SC commit store.
func TestBeginBlockAppliesMigrationBatchSize(t *testing.T) {
a := Setup(t, false, false, false)
ctx := a.NewContext(false, tmproto.Header{Height: 2, ChainID: "sei-test", Time: time.Now()})

// Sanity: nothing set yet, so the store is paused at 0.
before, ok := a.rootStore.GetMigrationBatchSize()
require.True(t, ok)
require.Equal(t, 0, before)

// Simulate the gov proposal landing in chain state.
subspace, ok := a.ParamsKeeper.GetSubspace(migration.SubspaceName)
require.True(t, ok)
subspace.Set(ctx, migration.KeyNumKeysToMigratePerBlock, uint64(321))

// Run the real BeginBlock (checkHeight=false to skip height validation).
require.NotPanics(t, func() {
a.BeginBlock(ctx, 2, nil, nil, false)
})

after, _ := a.rootStore.GetMigrationBatchSize()
require.Equal(t, 321, after, "BeginBlock should push the gov param into the SC store")
}

// TestMigrationBatchSizeTakesEffectNextBlock is the full end-to-end timing
// check: a governance proposal committed in block N (written into the block's
// deliver state, then Commit) only changes the SC store's migration rate when
// block N+1's BeginBlock runs and reads it from committed state.
func TestMigrationBatchSizeTakesEffectNextBlock(t *testing.T) {
a := Setup(t, false, false, false)
bg := context.Background()

// Block 1: BeginBlock runs first (param still unset), then the gov
// proposal lands by writing into this block's deliver state, then Commit
// persists it to the committed multistore.
_, err := a.FinalizeBlock(bg, &abci.RequestFinalizeBlock{
Header: &tmproto.Header{ChainID: "sei-test", Height: 1, Time: time.Now()},
})
require.NoError(t, err)

subspace, ok := a.ParamsKeeper.GetSubspace(migration.SubspaceName)
require.True(t, ok)
subspace.Set(a.GetContextForDeliverTx([]byte{}), migration.KeyNumKeysToMigratePerBlock, uint64(640))

_, err = a.Commit(bg)
require.NoError(t, err)

// The param was committed in block 1, but BeginBlock(1) ran before it
// existed, so the rate is still paused at this point.
got, ok := a.rootStore.GetMigrationBatchSize()
require.True(t, ok)
require.Equal(t, 0, got, "param committed in block 1 must not take effect within block 1")

// Block 2: BeginBlock reads the now-committed param and applies it.
_, err = a.FinalizeBlock(bg, &abci.RequestFinalizeBlock{
Header: &tmproto.Header{ChainID: "sei-test", Height: 2, Time: time.Now().Add(time.Second)},
})
require.NoError(t, err)

got, _ = a.rootStore.GetMigrationBatchSize()
require.Equal(t, 640, got, "migration rate must take effect on the block after the param is committed")
}
19 changes: 16 additions & 3 deletions app/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ import (
"github.com/sei-protocol/sei-chain/sei-cosmos/server/config"
servertypes "github.com/sei-protocol/sei-chain/sei-cosmos/server/types"
storetypes "github.com/sei-protocol/sei-chain/sei-cosmos/store/types"
storev2_rootmulti "github.com/sei-protocol/sei-chain/sei-cosmos/storev2/rootmulti"
sdk "github.com/sei-protocol/sei-chain/sei-cosmos/types"
sdkerrors "github.com/sei-protocol/sei-chain/sei-cosmos/types/errors"
genesistypes "github.com/sei-protocol/sei-chain/sei-cosmos/types/genesis"
Expand Down Expand Up @@ -110,6 +109,7 @@ import (
"github.com/sei-protocol/sei-chain/app/antedecorators"
"github.com/sei-protocol/sei-chain/app/benchmark"
"github.com/sei-protocol/sei-chain/app/legacyabci"
"github.com/sei-protocol/sei-chain/app/migration"
appparams "github.com/sei-protocol/sei-chain/app/params"
"github.com/sei-protocol/sei-chain/app/upgrades"
v0upgrade "github.com/sei-protocol/sei-chain/app/upgrades/v0"
Expand Down Expand Up @@ -140,6 +140,7 @@ import (
tmtypes "github.com/sei-protocol/sei-chain/sei-tendermint/types"
wasmkeeper "github.com/sei-protocol/sei-chain/sei-wasmd/x/wasm/keeper"

"github.com/sei-protocol/sei-chain/sei-cosmos/storev2/rootmulti"
"github.com/sei-protocol/sei-chain/utils"
utilmetrics "github.com/sei-protocol/sei-chain/utils/metrics"
"github.com/sei-protocol/sei-chain/wasmbinding"
Expand Down Expand Up @@ -469,6 +470,7 @@ type App struct {
genesisImportConfig genesistypes.GenesisImportConfig

stateStore seidb.StateStore
rootStore *rootmulti.Store
receiptStore receipt.ReceiptStore

forkInitializer func(sdk.Context)
Expand Down Expand Up @@ -549,6 +551,16 @@ func New(
option(app)
}

// The storev2 rootmulti store is the only supported commit multistore; its
// composite SC backend drives the in-flight memiavl->flatkv migration that
// BeginBlock paces via the migration gov param. Fail fast if the legacy
// root multistore is somehow in use.
rootStore, ok := app.CommitMultiStore().(*rootmulti.Store)
if !ok {
panic(fmt.Sprintf("unsupported commit multistore %T: expected *storev2_rootmulti.Store", app.CommitMultiStore()))
Comment thread
yzang2019 marked this conversation as resolved.
}
app.rootStore = rootStore

app.ParamsKeeper = initParamsKeeper(appCodec, cdc, keys[paramstypes.StoreKey], tkeys[paramstypes.TStoreKey])

// set the BaseApp's parameter store
Expand Down Expand Up @@ -746,7 +758,7 @@ func New(
app.EvmKeeper.SetTraceDB(traceDB)

if app.evmRPCConfig.TraceBakeUseSnapshot {
if rs, ok := app.CommitMultiStore().(*storev2_rootmulti.Store); ok {
if rs, ok := app.CommitMultiStore().(*rootmulti.Store); ok {
app.EvmKeeper.SetTraceSnapshotStore(evmkeeper.NewTraceSnapshotStore(app.evmRPCConfig.TraceBakeSnapshotWindow))
app.EvmKeeper.SetTraceSnapshotCapture(rs.SnapshotSCStore)
} else {
Expand Down Expand Up @@ -2670,7 +2682,7 @@ func (app *App) SnapshotAwareRPCContextProvider() evmrpc.TraceContextProvider {
return app.RPCContextProvider(i), func() {}
})
}
rs, ok := app.CommitMultiStore().(*storev2_rootmulti.Store)
rs, ok := app.CommitMultiStore().(*rootmulti.Store)
if !ok {
return evmrpc.TraceContextProvider(func(i int64) (sdk.Context, func()) {
return app.RPCContextProvider(i), func() {}
Expand Down Expand Up @@ -2933,6 +2945,7 @@ func initParamsKeeper(appCodec codec.BinaryCodec, legacyAmino *codec.LegacyAmino
paramsKeeper.Subspace(evmtypes.ModuleName)
paramsKeeper.Subspace(epochmoduletypes.ModuleName)
paramsKeeper.Subspace(tokenfactorytypes.ModuleName)
paramsKeeper.Subspace(migration.SubspaceName).WithKeyTable(migration.ParamKeyTable())
// this line is used by starport scaffolding # stargate/app/paramSubspace

return paramsKeeper
Expand Down
45 changes: 45 additions & 0 deletions app/migration/params.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// Package migration defines the module-agnostic governance parameters
// that control the state-commitment store's background data migration
// (currently memiavl->flatkv).
//
// These live outside any business module on purpose: the migration rate
// applies to whichever stores the SC router is migrating, so it is an
// app/storage-level concern rather than EVM-specific. The value is held in a
// dedicated x/params subspace and is editable via the standard
// ParameterChangeProposal gov flow. The app reads it once per block in
// BeginBlock and pushes it into the SC commit store.
package migration
Comment on lines +1 to +11

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Pre-existing nit (flagged by seidroid[bot] as a [suggestion]): The new migration x/params subspace registered at app/app.go:2948 has no owning AppModule, so no module's ExportGenesis ever persists NumKeysToMigratePerBlock. The params module's ExportGenesis only emits FeesParams and CosmosGasParams (it does not iterate registered subspaces), so running seid export mid-migration produces a genesis JSON with no migration.NumKeysToMigratePerBlock entry; bootstrapping a new chain from that genesis (testnet replication, fork migration, recovery) silently resets the rate to the default 0 (paused) on the first BeginBlock and the drain halts until governance re-files a proposal. Consider adding a minimal AppModule with InitGenesis/ExportGenesis for the migration subspace, or documenting the caveat in app/migration/params.go.

Extended reasoning...

What the bug is

The PR adds a generic migration x/params subspace (app/migration/params.go:1-45) and registers it at app/app.go:2948 via paramsKeeper.Subspace(migration.SubspaceName).WithKeyTable(migration.ParamKeyTable()). Every other subspace registered alongside it (auth, bank, staking, mint, distr, slashing, gov, ibc, oracle, wasm, evm, epoch, tokenfactory) belongs to a business module whose own AppModule.ExportGenesis round-trips the subspace's stored values. The new migration subspace has no owning AppModule in the ModuleManager — verified by reading the module list at app/app.go:877 — and the params module's own ExportGenesis (sei-cosmos/x/params/keeper/genesis.go) only writes FeesParams and CosmosGasParams to GenesisState:

func (k Keeper) ExportGenesis(ctx sdk.Context) *types.GenesisState {
    feesParams := k.GetFeesParams(ctx)
    cosmosGasParams := k.GetCosmosGasParams(ctx)
    return types.NewGenesisState(feesParams, cosmosGasParams)
}

It does not iterate over registered subspaces and emit their stored KV pairs. So nothing in the export pipeline ever serializes migration.NumKeysToMigratePerBlock.

How the failure manifests — step by step

  1. Governance raises migration.NumKeysToMigratePerBlock to, say, 5000 via a ParameterChangeProposal (exactly the path the new gov_proposal_test.yaml test exercises). The chain begins draining memiavl→flatkv at 5000 keys/block.
  2. While the migration is still in flight (boundary not yet at MigrationBoundaryComplete), an operator runs seid export to capture the current chain state — for a testnet replication, a coordinated fork, or recovery from a corrupted node.
  3. The exported genesis JSON contains every business module's params but no migration.NumKeysToMigratePerBlock entry: the params module's ExportGenesis dropped it, and no other module owns the subspace.
  4. A new chain is bootstrapped from this exported genesis. The migration subspace is registered in initParamsKeeper at app/app.go:2948, but its key-value store is empty: nothing wrote KeyNumKeysToMigratePerBlock during InitGenesis.
  5. At block 1 BeginBlock, applyMigrationBatchSize (app/abci.go:75-91) runs the lazy-seed branch:
    if !subspace.Has(ctx, migration.KeyNumKeysToMigratePerBlock) {
        subspace.Set(ctx, migration.KeyNumKeysToMigratePerBlock, migration.DefaultNumKeysToMigratePerBlock)
    }
    This persists DefaultNumKeysToMigratePerBlock = 0 (app/migration/params.go:28), which means paused.
  6. The SC store receives SetMigrationBatchSize(0), the MigrationManager's advanceMigration gate (sei-db/state_db/sc/migration/migration_manager.go:293-299, firstBatchInBlock && m.migrationBatchSize > 0) suppresses every boundary advance, and the drain silently halts. Block production continues normally (consensus is unaffected; caller writes still route through the migration manager), so there is no panic, no log warning, no metric anomaly — the operator has no signal that anything is wrong unless they actively poll migration.NumKeysToMigratePerBlock.

Why existing code doesn't prevent it

  • The lazy-seed in applyMigrationBatchSize is deterministic (every node runs BeginBlock identically) and idempotent, so it correctly handles a fresh chain. But it cannot distinguish a fresh chain from an exported-during-migration restore: in both cases, the subspace key is absent, so it writes the same default. The exported chain's mid-migration rate is the load-bearing context the lazy seed has no way to recover.
  • The params module's ExportGenesis is shared across all chains using sei-cosmos and only serializes FeesParams / CosmosGasParams; modifying it to dump arbitrary subspaces would be a broader change than this PR.
  • The new abci_test.go tests cover applyMigrationBatchSize and the gov-proposal path, but none exercises an export/import round-trip.

Impact

This is not consensus-fatal: the resumed chain produces blocks correctly, just with the migration paused at the genesis boundary. Recovery is a single governance proposal to re-raise the param. seidroid[bot] itself labeled it [suggestion], not [blocker], in inline-comment 3482817040. The trigger surface is narrow (export during an active migration AND restart from genesis, not a regular restart which preserves the KVStore), and the migration is a one-time operational event that already requires coordinated governance action. But it turns seid export/import into a silent loss of migration progress, which is exactly the kind of operational pitfall an export-based fork or recovery would hit unexpectedly.

How to fix it

Three reasonable options, in increasing invasiveness:

  1. Document the caveat prominently in app/migration/params.go's package doc and in any export-tooling runbook: "the migration subspace is not exported by seid export; if the chain is mid-migration, re-issue the param-change proposal on the new chain."
  2. Log a startup warning in applyMigrationBatchSize (or in the SC store init) when sc-write-mode is a migration mode (migrate_evm, migrate_bank, migrate_all_but_bank) and the gov param is at the default 0, so operators have a chance to notice immediately rather than discovering the silent halt hours later via dashboard.
  3. Add a minimal AppModule for the migration subspace with InitGenesis/ExportGenesis (analogous to the other params-only modules), so the value round-trips through seid export/import deterministically.

🔬 also observed by seidroid


import (
"fmt"

paramtypes "github.com/sei-protocol/sei-chain/sei-cosmos/x/params/types"
)

// SubspaceName is the x/params subspace that holds storage-migration controls.
const SubspaceName = "migration"

// KeyNumKeysToMigratePerBlock is the param key for the number of keys the
// in-flight SC migration advances per block.
var KeyNumKeysToMigratePerBlock = []byte("NumKeysToMigratePerBlock")

// DefaultNumKeysToMigratePerBlock leaves the migration paused. While it is 0
// (the default until a gov proposal raises it) the SC store does no migration
// work; this param is the sole source of the per-block rate.
const DefaultNumKeysToMigratePerBlock uint64 = 0

// ParamKeyTable returns the key table for the migration subspace.
func ParamKeyTable() paramtypes.KeyTable {
return paramtypes.NewKeyTable(
paramtypes.NewParamSetPair(KeyNumKeysToMigratePerBlock, new(uint64), validateNumKeysToMigratePerBlock),
)
}

// validateNumKeysToMigratePerBlock only type-checks the value; any uint64 is a
// valid (consensus-deterministic) rate, with 0 meaning "paused".
func validateNumKeysToMigratePerBlock(i interface{}) error {
if _, ok := i.(uint64); !ok {
return fmt.Errorf("invalid parameter type: %T", i)
}
return nil
}
Comment on lines +38 to +45

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: validateNumKeysToMigratePerBlock only type-checks the value, so a ParameterChangeProposal can set the migration batch size to any uint64 (including math.MaxUint64). The applyMigrationBatchSize clamp caps it at math.MaxInt64, which then propagates to MemiavlMigrationIterator.NextBatch where make([]ValueToMigrate, 0, size) either panics (makeslice: cap out of range) or OOMs every validator deterministically — chain halt with no recovery path other than coordinated downgrade. Reject values above a sane maximum (e.g. 1_000_000) here; seidroid[bot] flagged this in inline-comment 3483034959 and only the int-cast overflow was addressed.

Extended reasoning...

Bug

validateNumKeysToMigratePerBlock at app/migration/params.go:40-45 accepts any uint64. Any value, including math.MaxUint64, passes a ParameterChangeProposal validation. The new clamp added in this PR at app/abci.go:86-88 caps the value at math.MaxInt64, which addresses the cursor[bot] int-cast overflow concern but creates a new deterministic chain-halt vector.

The chain from gov proposal to consensus halt

  1. A ParameterChangeProposal sets migration.NumKeysToMigratePerBlock to a very large uint64 (typo, fat finger, or malicious proposal). Validation passes because it is type-checked only.
  2. After the voting period, the param lands in chain state. In the next BeginBlock, applyMigrationBatchSize reads it (app/abci.go:75-90), clamps to math.MaxInt64, and calls app.rootStore.SetMigrationBatchSize(int(math.MaxInt64)).
  3. CompositeCommitStore.SetMigrationBatchSize only clamps negatives to 0 — a positive MaxInt64 passes through unchanged into atomic.Int64 storage and is forwarded via cs.router.SetMigrationBatchSize.
  4. MigrationManager.SetMigrationBatchSize stores it as m.migrationBatchSize. On the first ApplyChangeSets of the block, advanceMigration is true (batch size > 0), so the manager calls m.iterator.NextBatch(m.migrationBatchSize).
  5. MemiavlMigrationIterator.NextBatch at sei-db/state_db/sc/migration/memiavl_migration_iterator.go:94 executes batch := make([]ValueToMigrate, 0, size).

ValueToMigrate is {ModuleName string; Key []byte; Value []byte} ≈ 16+24+24 = 64 bytes. make([]ValueToMigrate, 0, math.MaxInt64) asks the Go runtime to allocate MaxInt64 × 64B ≈ 590 EB. The runtime checks cap × elemSize against maxAlloc (well below MaxInt64 on 64-bit) and panics with runtime error: makeslice: cap out of range. The panic propagates out of ApplyChangeSets, fails Commit, and aborts block production.

Step-by-step proof

Concrete scenario with value 18446744073709551615 (MaxUint64):

  • Block N (proposal voting): validator nodes accept the proposal. validateNumKeysToMigratePerBlock(uint64(18446744073709551615)) returns nil because the type assertion i.(uint64) succeeds.
  • Block N+M (param applied): every validator runs BeginBlock. subspace.GetIfExists(ctx, KeyNumKeysToMigratePerBlock, &numKeys) sets numKeys = 18446744073709551615. The clamp if numKeys > uint64(math.MaxInt64) { numKeys = uint64(math.MaxInt64) } reduces it to 9223372036854775807. app.rootStore.SetMigrationBatchSize(9223372036854775807) is called.
  • Same block, first write: CompositeCommitStore.SetMigrationBatchSize(9223372036854775807) stores it (the < 0 clamp does not trigger). The router push reaches MigrationManager.migrationBatchSize = 9223372036854775807.
  • Same block, ApplyChangeSets: advanceMigration = firstBatchInBlock && (9223372036854775807 > 0) = true. NextBatch(9223372036854775807) runs. The first thing it does after the size <= 0 and Complete checks is make([]ValueToMigrate, 0, 9223372036854775807).
  • Runtime: runtime.makeslice computes cap * elemSize = 9223372036854775807 * 64, which overflows uintptr and triggers the cap-bounds check. Panic.

Because the gov param is consensus state, every validator reads the same value and panics identically on the same height. The chain halts. No recovery path exists short of a coordinated downgrade or a new gov proposal — and the chain cannot produce blocks to vote on either.

Even non-overflowing values are dangerous: 10^10 requests ~640 GB of contiguous slice memory, OOM-killing every validator the moment the param hits chain state.

Why existing code does not prevent it

The defenses in place address adjacent concerns:

  • validateNumKeysToMigratePerBlock does type-checking only (explicitly: "any uint64 is a valid (consensus-deterministic) rate").
  • The app/abci.go clamp prevents the int(uint64) cast from yielding a negative number on 64-bit (the cursor[bot] concern), but leaves the magnitude unbounded.
  • CompositeCommitStore.SetMigrationBatchSize normalizes negatives to 0, but the comment explicitly says "the lower layers therefore trust the batch size to be non-negative and do no validation of their own".
  • The relaxed migrationBatchSize <= 0 rejection in NewMigrationManager (now accepts 0) does not add an upper bound.

Reviewer history

seidroid[bot] in inline-comment 3483034959 explicitly suggested "bounding the param to a sane maximum (well under math.MaxInt) at validation time". yzang2019 replied in inline-comment 3483128268: "Tried that, looks like it would require a big refactory, will actually do a fallback here to 0 if its negative." The negative-clamp landed; the upper-bound suggestion did not.

Pre-existing vs PR-introduced

PR-introduced. The pre-PR migration rate was a node-local app.toml field (sc-keys-to-migrate-per-block) bounded by validator operators reading their own config. This PR removes that field and makes the rate a chain-wide governance param, but does not add the upper bound that governance-controlled values warrant.

Fix

One-line guard in validateNumKeysToMigratePerBlock:

const MaxNumKeysToMigratePerBlock = 1_000_000 // or another sane upper bound

func validateNumKeysToMigratePerBlock(i interface{}) error {
    v, ok := i.(uint64)
    if !ok {
        return fmt.Errorf("invalid parameter type: %T", i)
    }
    if v > MaxNumKeysToMigratePerBlock {
        return fmt.Errorf("NumKeysToMigratePerBlock must be <= %d, got %d", MaxNumKeysToMigratePerBlock, v)
    }
    return nil
}

This rejects the proposal at submission, before it can reach chain state.

Severity

normal. The trigger requires a successful gov proposal (quorum + majority yes), which is a real barrier — but governance error or a malicious proposal would deterministically halt the chain, and recovery would be impossible without a coordinated downgrade. The fix is trivial and was explicitly recommended by another reviewer.

🔬 also observed by 3483034959

36 changes: 36 additions & 0 deletions app/migration/params_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
package migration

import (
"testing"

paramtypes "github.com/sei-protocol/sei-chain/sei-cosmos/x/params/types"
"github.com/stretchr/testify/require"
)

func TestDefaultLeavesMigrationPaused(t *testing.T) {
require.Equal(t, uint64(0), DefaultNumKeysToMigratePerBlock,
"default must be 0 so the migration stays paused until governance raises it")
}

func TestParamKeyTableRegistersKey(t *testing.T) {
table := ParamKeyTable()

// Re-registering the same key must panic with "duplicate parameter key",
// which proves NumKeysToMigratePerBlock is already in the returned table.
require.PanicsWithValue(t, "duplicate parameter key", func() {
table.RegisterType(paramtypes.NewParamSetPair(
KeyNumKeysToMigratePerBlock, new(uint64), validateNumKeysToMigratePerBlock))
})
}

func TestValidateNumKeysToMigratePerBlock(t *testing.T) {
// Any uint64 is valid, including 0 (paused) and large values.
for _, v := range []uint64{0, 1, 1024, 1 << 40} {
require.NoError(t, validateNumKeysToMigratePerBlock(v), "uint64 %d should be valid", v)
}

// Wrong types are rejected.
for _, v := range []interface{}{int(1), int64(1), "1", float64(1), nil} {
require.Error(t, validateNumKeysToMigratePerBlock(v), "value %v (%T) should be rejected", v, v)
}
}
Loading
Loading