Skip to content

feat: Add KMS signer backend#3171

Draft
alpe wants to merge 4 commits intomainfrom
alex/3163_kms
Draft

feat: Add KMS signer backend#3171
alpe wants to merge 4 commits intomainfrom
alex/3163_kms

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Mar 17, 2026

🚧 Resolve #3163 - early version

Overview

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a53caa41-6b97-4608-920f-0d1dd3b4ed03

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch alex/3163_kms
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 19, 2026, 10:49 AM

@claude
Copy link
Contributor

claude bot commented Mar 17, 2026

Claude finished @alpe's task in 2m 1s —— View job


Code Review: KMS Signer Backend

  • Gather context and understand the PR
  • Review core KMS signer implementation (pkg/signer/aws/signer.go)
  • Review signer factory (pkg/signer/factory/factory.go)
  • Review config changes (pkg/config/config.go)
  • Review cmd changes (pkg/cmd/init.go, pkg/cmd/run_node.go)
  • Review interface updates (pkg/signer/signer.go)
  • Review test coverage
  • Summarize findings

Summary

This PR adds AWS KMS as a signing backend, propagates context.Context through the signer interface, and introduces a factory to abstract signer creation. The overall design is solid — the KMSClient interface enables clean testability, the retry/backoff logic is well-implemented, and key caching with TTL support is a nice touch. The main concerns are around a few correctness issues, a stale-context bug, and zero test coverage on the factory.


Issues

🔴 Bug: GetPublic() ignores caller context during refresh

pkg/signer/aws/signer.go:217

refreshCtx, cancel := context.WithTimeout(context.Background(), s.opts.timeout())

context.Background() is used here instead of a caller-provided context. This means that even when the node is shutting down (and the caller's context is cancelled), the public key refresh can still block for up to opts.timeout(). Since GetPublic() is called from signing paths, this could delay shutdown. The Signer.GetPublic() interface doesn't take a ctx parameter, so this is a design-level issue — consider either passing a context (breaking the interface) or at least using a shared lifecycle context stored on the struct.

Fix this →


🔴 Default timeout discrepancy (5s in constructor vs 10s in docs/config)

pkg/signer/aws/signer.go:97

o := Options{Timeout: 5 * time.Second, MaxRetries: 3, CacheTTL: 0}

But Options.Timeout is documented as "Default: 10s", and pkg/config/defaults.go:99 sets KmsTimeout: DurationWrapper{10 * time.Second}. This means when using the factory with default config, the timeout is correctly 10s — but calling NewKmsSignerFromClient directly with nil opts gets 5s. The discrepancy is confusing and means direct callers get a different default than config-driven callers.

Fix this →


🟡 Race: double public-key fetch on TTL expiry

pkg/signer/aws/signer.go:211-230

Multiple goroutines calling GetPublic() concurrently when TTL has expired will all pass the expired check (each reads s.mu.RLock(), observes expiry, unlocks, then all call fetchPublicKey independently). This is benign — the last write wins and all writes are valid — but it's a thundering-herd problem under concurrent load. A simple fix is a sync.Once-reset pattern or a singleflight to coalesce concurrent refreshes.


🟡 timer leak in retry backoff

pkg/signer/aws/signer.go:179

case <-time.After(backoff):

time.After creates a time.Timer whose channel is never garbage-collected until the timer fires. In the happy path (context cancelled), the timer leaks until backoff elapses. Prefer time.NewTimer(backoff) with t.Stop() on early return to avoid the leak. While minor for small retry counts, it's a best-practice issue.


🟡 FlagSignerType help text mentions grpc which doesn't exist

pkg/config/config.go:580

cmd.Flags().String(FlagSignerType, def.Signer.SignerType, "type of signer to use (file, grpc, awskms)")

The factory (pkg/signer/factory/factory.go) only handles "file" and "awskms". The "grpc" mention is stale/incorrect and will confuse users — factory.go would return "unknown signer type: grpc".

Fix this →


🟡 KMS init during evnode init may surprise users

pkg/cmd/init.go:25

When SignerType == "awskms" and Aggregator == true, NewSignerForInit calls awssigner.NewKmsSigner, which immediately makes a GetPublicKey API call to AWS. A user running evnode init with --evnode.signer.signer_type=awskms will need live AWS credentials just to initialize the config — even if they plan to provide credentials at runtime. Consider documenting this requirement prominently or adding a --skip-kms-verify flag.


🟡 Unused helper methods timeout(), maxRetries(), cacheTTL()

pkg/signer/aws/signer.go:44-48

func (o *Options) timeout() time.Duration { return o.Timeout }
func (o *Options) maxRetries() int        { return o.MaxRetries }
func (o *Options) cacheTTL() time.Duration { return o.CacheTTL }

These are private one-liner wrappers with no behaviour — they just return the field. The call sites already have access to s.opts.Timeout directly. These wrappers add indirection without abstraction value.


🟡 SignerConfig.Validate() missing for file type

pkg/config/config.go:413

Validate() checks KMS config when SignerType == "awskms" but doesn't validate the file signer (e.g., SignerPath non-empty when aggregator is set). For consistency and early failure, consider adding symmetric validation.


🟢 Zero test coverage on factory (pkg/signer/factory/factory.go)

Codecov reports 0% coverage on the factory. This is the integration glue between config and signers, and it contains non-trivial path manipulation logic (the TrimSuffix("signer.json") + filepath.Abs pattern). Basic tests covering the "file" and "awskms" paths (with mocks) would significantly improve reliability.


Nits

  • pkg/signer/aws/signer.go:70: cfgOpts := []func(*awsconfig.LoadOptions) error{} can be simplified to var cfgOpts []func(*awsconfig.LoadOptions) error.
  • pkg/signer/aws/signer.go:267-270: The isRetryableKMSError string list is long — consider extracting to a var retryableCodes = map[string]bool{...} for readability.
  • pkg/signer/factory/factory.go:35: strings.TrimSuffix(config.Signer.SignerPath, "signer.json") is fragile if the user provides "/some/path" (doesn't end with "signer.json") — in that case the Abs call is on the raw path. This is likely fine per the comment but worth a test.
  • pkg/cmd/run_node.go:141-143: The awskms-specific log line is slightly inconsistent; other signer types don't get a log. Consider moving this into the factory or logging the type generically.
  • The TestGetPublic_RefreshTimeoutReturnsStale test uses time.Sleep(2 * time.Millisecond) which could be flaky in slow CI environments. Using a very short but deterministic mechanism (like injecting a mock clock) would be more robust.

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 62.76151% with 89 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.12%. Comparing base (d27d099) to head (d56fe5f).

Files with missing lines Patch % Lines
pkg/signer/aws/signer.go 70.86% 28 Missing and 9 partials ⚠️
pkg/signer/factory/factory.go 0.00% 27 Missing ⚠️
pkg/cmd/run_node.go 47.36% 6 Missing and 4 partials ⚠️
pkg/config/config.go 50.00% 6 Missing and 1 partial ⚠️
pkg/cmd/init.go 73.33% 2 Missing and 2 partials ⚠️
block/internal/submitting/da_submitter.go 80.00% 2 Missing ⚠️
pkg/signer/noop/signer.go 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3171      +/-   ##
==========================================
- Coverage   61.14%   61.12%   -0.02%     
==========================================
  Files         117      119       +2     
  Lines       12082    12254     +172     
==========================================
+ Hits         7387     7490     +103     
- Misses       3868     3929      +61     
- Partials      827      835       +8     
Flag Coverage Δ
combined 61.12% <62.76%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Add KMS signer backend

1 participant