Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Access] Refactor storage collections for access node #7093

Draft
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

zhangchiqing
Copy link
Member

Working towards #6515

Review #7059 first.

This PR refactors the transactions and collection storage in access node to use the generic storage module.

@codecov-commenter
Copy link

codecov-commenter commented Mar 5, 2025

Codecov Report

Attention: Patch coverage is 6.38298% with 352 lines in your changes missing coverage. Please review.

Project coverage is 41.16%. Comparing base (89dfb91) to head (a9c3140).

Files with missing lines Patch % Lines
storage/mock/events_reader.go 0.00% 94 Missing ⚠️
storage/mock/transaction_results_reader.go 0.00% 72 Missing ⚠️
storage/mock/execution_results_reader.go 0.00% 63 Missing ⚠️
consensus/hotstuff/mocks/persister_reader.go 0.00% 50 Missing ⚠️
storage/mock/commits_reader.go 0.00% 28 Missing ⚠️
cmd/access/node_builder/access_node_builder.go 0.00% 23 Missing ⚠️
storage/store/collections.go 55.88% 11 Missing and 4 partials ⚠️
storage/operation/collections.go 20.00% 4 Missing ⚠️
storage/store/transactions.go 0.00% 2 Missing ⚠️
cmd/observer/node_builder/observer_builder.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7093      +/-   ##
==========================================
- Coverage   41.24%   41.16%   -0.08%     
==========================================
  Files        2171     2176       +5     
  Lines      190061   190382     +321     
==========================================
- Hits        78391    78377      -14     
- Misses     105127   105451     +324     
- Partials     6543     6554      +11     
Flag Coverage Δ
unittests 41.16% <6.38%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zhangchiqing zhangchiqing force-pushed the leo/refactor-storage-collections-for-an branch from d72e501 to a5f43fb Compare March 6, 2025 17:09
@@ -218,6 +217,9 @@ func (builder *ExecutionNodeBuilder) LoadComponentsAndModules() {
Module("blobservice peer manager dependencies", exeNode.LoadBlobservicePeerManagerDependencies).
Module("bootstrap", exeNode.LoadBootstrapper).
Module("register store", exeNode.LoadRegisterStore).
AdminCommand("get-transactions", func(conf *NodeConfig) commands.AdminCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the exeNode.collections was not initialized until exeNode.LoadCollections is called.

That said, this change should be in a different PR, let me check.

Comment on lines 24 to 25
// IndexCollectionPayload indexes the transactions within the collection payload
// of a cluster block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specific to collection cluster logic, or is this just indexing by blockID?

I'm wondering if we're overloading the codeIndexCollection to mean different things on ANs/ENs vs LNs

t.Run("Retrieve nonexistant", func(t *testing.T) {
var actual flow.LightCollection
err := operation.RetrieveCollection(db.Reader(), expected.ID(), &actual)
assert.Error(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Error(t, err)
assert.ErrorIs(t, err, storage.ErrNotFound)
assert.Nil(t, actual)


var actual flow.LightCollection
err = operation.RetrieveCollection(db.Reader(), expected.ID(), &actual)
assert.Error(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you assert the specific error here and wherever we have sentinels returned


_ = db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
err := operation.InsertCollection(rw.Writer(), &expected)
assert.Nil(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assert.NoError() communicates your intent more clearly

Suggested change
assert.Nil(t, err)
assert.NoError(t, err)


func TestTransactions(t *testing.T) {

dbtest.RunWithDB(t, func(t *testing.T, db storage.DB) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a not found test as well

}

func NewCollections(db storage.DB, transactions *Transactions) *Collections {
c := &Collections{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about adding a cache? collections are commonly looked up on access nodes. totally fine to do later


func (c *Collections) Remove(colID flow.Identifier) error {
err := c.db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
return operation.RemoveCollection(rw.Writer(), colID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also remove the index?

// transaction is already indexed by a different collection, we should not index it again
// so that the access node will always return the same collection for a given transaction
// and return a consistent transaction result status.
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return an error here since LNs are supposed to prevent a tx from

  1. appearing multiple times in the same collection
  2. appearing in multiple collections

Comment on lines +967 to +970
notNil(builder.events),
notNil(builder.collections),
notNil(builder.transactions),
notNil(builder.lightTransactionResults),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for using notNil on only some inputs?

Shouldn't all the inputs be non-nil? And if an input is nil, won't the component using it quickly panic anyway?

Comment on lines 60 to 64
// RemoveCollectionTransactionIndices removes a collection id indexed by a transaction id
// any error returned are exceptions
func RemoveCollectionTransactionIndices(w storage.Writer, txID flow.Identifier) error {
return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RemoveCollectionTransactionIndices removes a collection id indexed by a transaction id
// any error returned are exceptions
func RemoveCollectionTransactionIndices(w storage.Writer, txID flow.Identifier) error {
return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))
}
// RemoveCollectionByTransactionIndex removes a collection id indexed by a transaction id,
// created by [UnsafeIndexCollectionByTransaction].
// Any error returned is an exception.
func RemoveCollectionByTransactionIndex(w storage.Writer, txID flow.Identifier) error {
return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))
}

Naming to match the insert method for same index.

Comment on lines 85 to 88
if err != nil {
return nil, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
return nil, err
}

The error is already checked above

@@ -52,3 +50,15 @@ func UnsafeIndexCollectionByTransaction(w storage.Writer, txID flow.Identifier,
func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {
// LookupCollectionByTransaction looks up the collection indexed by the given transaction ID,
// which is the collection in which the given transaction was included.
// No errors are expected during normal operaion.
func LookupCollectionByTransaction(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {

To match naming of other methods operating on the same index.

err = c.db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
// remove transaction indices
for _, txID := range col.Transactions {
err = operation.RemoveCollectionTransactionIndices(rw.Writer(), txID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collections.Store above inserts transactions. Should Remove also remove the transactions?

}
continue
// the indexingByTx lock has ensured we are the only process indexing collection by transaction
err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, collection.ID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, collection.ID())
err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, cid)

Avoid re-computing the hash every loop iteration

Comment on lines 151 to 157
if err == nil {
// collection nodes have ensured that a transaction can only belong to one collection
// so if transaction is already indexed by a collection, check if it's the same collection.
// if not, return an error
if cid != differentColTxIsIn {
return fmt.Errorf("transaction %v is already indexed by a different collection %v", txID, differentColTxIsIn)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is substantially changing the behaviour.

Previously, we would skip re-indexing TXID->COLLECTIONID, if any index entry for TXID already existed. Now we are throwing an exception.

The reason we specifically check for the case of the index already existing is to make sure that we don't overwrite the index with a different collection ID, so that the information served by the Access API is consistent (if not correct). Now this scenario will cause an exception and likely the node will enter a crash-loop. To match the previous behaviour, the case of err == nil on line 151 should be a no-op.

It is true that we don't currently expect this scenario to happen, absent a cluster consensus bug, but we have had such bugs in the past, and in the mature system we need to tolerate Byzantine clusters. So I don't think this should throw an exception.

@zhangchiqing zhangchiqing force-pushed the leo/refactor-storage-collections-for-an branch from 8bdd882 to 35692d7 Compare March 13, 2025 18:06
@@ -575,6 +577,15 @@ func (builder *FlowAccessNodeBuilder) BuildExecutionSyncComponents() *FlowAccess
AdminCommand("read-execution-data", func(config *cmd.NodeConfig) commands.AdminCommand {
return stateSyncCommands.NewReadExecutionDataCommand(builder.ExecutionDataStore)
}).
Module("transactions and collections storage", func(node *cmd.NodeConfig) error {
// TODO: needs to be wrapped with ChainedCollections module, otherwise once we switch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link the issue as TODO here #6523 (comment) .

Will be addressed separately. We can review and approve this PR, but not merge until the TODO is completed.

cc @fxamacker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants