Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade tests #483

Merged
merged 14 commits into from
Feb 14, 2025
Merged

Upgrade tests #483

merged 14 commits into from
Feb 14, 2025

Conversation

mkysel
Copy link
Collaborator

@mkysel mkysel commented Feb 12, 2025

Poor mans upgrade tests.

The tests are behind a protected env variable so they don't run all the time. If you want to run them set ENABLE_UPGRADE_TESTS=1

The upgrade test will always attempt to build the local docker image, which can be somewhat slow. Hence the improvements to dockerignore. We were always rebuilding the image.

These steps are required:

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced Docker environment shutdown now performs comprehensive cleanup by removing orphaned containers and unused volumes.
    • Introduced an environment configuration tool that validates required settings prior to launching the application.
    • New test suite added to facilitate testing of the upgrade process for the application.
    • Added a new step in the GitHub Actions workflow to run upgrade tests.
    • New utility functions for managing Docker-related functionalities in tests.
  • Bug Fixes

    • Corrected the Docker build command to ensure proper execution without errors.
  • Chores

    • Updated default Docker image name to use the GitHub Container Registry.
    • Added Go module management to Dockerfiles to ensure dependencies are handled correctly.
    • Introduced a .dockerignore file to optimize Docker build processes.

Copy link

coderabbitai bot commented Feb 12, 2025

Warning

Rate limit exceeded

@mkysel has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 57 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 03de3e7 and ae68abe.

📒 Files selected for processing (2)
  • .github/workflows/test.yml (2 hunks)
  • pkg/upgrade/main_test.go (1 hunks)

Walkthrough

The pull request updates several components across the repository. The Docker “down” script now invokes docker_compose down with additional options to remove orphan containers and named volumes. A utility function in the test package has been made public by renaming it from getCallerName to GetCallerName. A new shell script is added to load and validate environment variables from a local file. Additionally, a comprehensive upgrade test file is introduced with functions for environment setup, Docker container management, and log streaming to facilitate testing of the upgrade process.

Changes

File(s) Change Summary
dev/docker/down Modified Docker command to docker_compose down --remove-orphans --volumes to also clean up orphan containers and named volumes.
pkg/testutils/store.go Renamed getCallerName to public GetCallerName and updated its usage in newInstanceDB to reflect the change in visibility.
pkg/upgrade/scripts/load_env.sh New script added to load, validate, and display environment variables from a local file, with strict error handling.
pkg/upgrade/upgrade_test.go New test function TestUpgradeToLatest added to test the upgrade process across different versions.
dev/docker/Dockerfile Added lines to copy Go module files and download dependencies.
dev/docker/build Updated default value of DOCKER_IMAGE_NAME from xmtp/xmtpd to ghcr.io/xmtp/xmtpd.
dev/docker/Dockerfile-cli Added lines to copy Go module files and download dependencies.
.github/workflows/test.yml New step added to run upgrade tests in the GitHub Actions workflow.
.dockerignore New file created to specify patterns for files and directories to be ignored during Docker builds.
pkg/upgrade/docker_utils_test.go New file added with utility functions for testing Docker-related functionalities.
pkg/upgrade/main_test.go New test package added with setup and custom TestMain function for managing test execution.

Suggested Reviewers

  • fbac

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
dev/docker/down (1)

6-6: Document the cleanup behavior.

Consider adding a comment explaining that this command performs a thorough cleanup including orphaned containers and volumes. This helps other developers understand the implications of running this script.

 #!/usr/bin/env bash
 set -e
 
 . dev/docker/env
 
+# Performs a thorough cleanup of Docker resources:
+# - Stops and removes all services defined in docker-compose.yml
+# - Removes orphaned containers (services not in compose file)
+# - Removes all named volumes
 docker_compose down --remove-orphans --volumes
pkg/upgrade/scripts/load_env.sh (1)

1-3: Enhance error handling and debugging capabilities.

While the current error handling is good, it could be improved for better reliability and debugging.

Apply this diff to enhance error handling:

 #!/bin/bash

-set -eu
+set -euo pipefail
+
+# Error handling
+error() {
+  echo "Error: $1" >&2
+  exit 1
+}
+
+# Trap errors
+trap 'error "Command failed at line $LINENO"' ERR
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69568d3 and ded632e.

📒 Files selected for processing (6)
  • cmd/replication/main.go (1 hunks)
  • dev/docker/down (1 hunks)
  • pkg/config/options.go (1 hunks)
  • pkg/testutils/store.go (2 hunks)
  • pkg/upgrade/scripts/load_env.sh (1 hunks)
  • pkg/upgrade/upgrade_test.go (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Test
pkg/upgrade/upgrade_test.go

[error] 121-138: Received unexpected error: exit status 125

🔇 Additional comments (4)
dev/docker/down (1)

6-6: LGTM! Verify impact on persistent data.

The addition of --remove-orphans --volumes flags ensures thorough cleanup of Docker resources, which is beneficial for testing environments. However, please verify this doesn't affect any data that needs to be preserved across restarts.

Run this script to check for volume persistence requirements:

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check if any volumes are marked for persistence
# Look for volume configurations in docker-compose files and related docs

# Search for volume configurations
echo "Checking volume configurations in docker-compose files:"
fd -e yml -e yaml | xargs rg -l 'volumes:'

# Search for persistence-related comments/docs
echo -e "\nChecking for persistence-related documentation:"
rg -i "persist|storage|volume" README.md docs/

Length of output: 399


LGTM – Verified for Dev Environment Cleanup.
Our check of the volume configurations in dev/docker/docker-compose.yml and the absence of persistent-volume documentation indicate that the --remove-orphans --volumes flags are appropriate for the development/upgrade testing environment. However, ensure that this aggressive cleanup aligns with your expectations for any data persistence requirements if the environment is ever used beyond testing purposes.

pkg/testutils/store.go (1)

23-32: LGTM! Good visibility change.

Making GetCallerName public is appropriate as it's now used across packages for test database naming.

cmd/replication/main.go (1)

76-79: LGTM! Clean namespace override implementation.

The implementation properly handles the override while maintaining backward compatibility with the existing namespace generation.

pkg/config/options.go (1)

28-28: LGTM! Well-structured configuration option.

The NameOverride field is properly tagged and appropriately marked as hidden since it's primarily for testing purposes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
pkg/upgrade/upgrade_test.go (2)

51-58: Consider making configuration values configurable.

The function hardcodes configuration values which could make the tests less flexible. Consider making these values configurable through environment variables or test parameters.

Apply this diff to improve flexibility:

+const (
+    defaultReplicationEnabled = "true"
+    defaultIndexerEnabled    = "true"
+)
+
 func expandVars(vars map[string]string) {
-    vars["XMTPD_REPLICATION_ENABLE"] = "true"
-    vars["XMTPD_INDEXER_ENABLE"] = "true"
+    if _, exists := vars["XMTPD_REPLICATION_ENABLE"]; !exists {
+        vars["XMTPD_REPLICATION_ENABLE"] = defaultReplicationEnabled
+    }
+    if _, exists := vars["XMTPD_INDEXER_ENABLE"]; !exists {
+        vars["XMTPD_INDEXER_ENABLE"] = defaultIndexerEnabled
+    }

78-87: Remove unused function.

The dockerLogs function is not used in the codebase. Consider removing it or using it in place of streamDockerLogs where appropriate.

🧰 Tools
🪛 golangci-lint (1.62.2)

78-78: func dockerLogs is unused

(unused)

🪛 GitHub Check: Lint-Go

[failure] 78-78:
func dockerLogs is unused (unused)

pkg/upgrade/scripts/load_env.sh (1)

5-8: Enhance error reporting.

The error function could provide more context by including the script name and line number.

Apply this diff to improve error reporting:

 error() {
-  echo "Error: $1" >&2
+  echo "Error in $(basename "$0") at line $BASH_LINENO: $1" >&2
   exit 1
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ded632e and 4ad79a1.

📒 Files selected for processing (2)
  • pkg/upgrade/scripts/load_env.sh (1 hunks)
  • pkg/upgrade/upgrade_test.go (1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
pkg/upgrade/scripts/load_env.sh

[warning] 22-22: ShellCheck can't follow non-constant source. Use a directive to specify location.

(SC1090)

🪛 golangci-lint (1.62.2)
pkg/upgrade/upgrade_test.go

78-78: func dockerLogs is unused

(unused)


167-167: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go
pkg/upgrade/upgrade_test.go

[failure] 78-78:
func dockerLogs is unused (unused)


[failure] 167-167:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)

🪛 GitHub Actions: Test
pkg/upgrade/upgrade_test.go

[error] 149-182: Received unexpected error: exit status 125

🔇 Additional comments (2)
pkg/upgrade/upgrade_test.go (2)

137-137: Add error handling for container cleanup.

The container cleanup silently ignores errors, which could leave orphaned containers.


178-186: Fix container startup order and environment sharing.

The test fails with error code 125 likely due to container startup issues.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
pkg/upgrade/upgrade_test.go (2)

178-186: ⚠️ Potential issue

Fix container startup order and environment sharing.

The test fails with database connection issues likely due to container startup order and environment sharing.

Apply this diff to improve reliability:

 func TestUpgradeFrom014(t *testing.T) {
+    t.Parallel()
+    
+    // First container environment
     envVars := constructVariables(t)
+    envVars["XMTPD_DB_NAME_OVERRIDE"] = "xmtpd_test_014_" + testutils.RandomStringLower(6)
     t.Logf("Starting old container")
     runContainer(t, "xmtpd_test_014", "ghcr.io/xmtp/xmtpd:0.1.4", envVars)
 
+    // Second container needs a new database
+    envVars = constructVariables(t)
+    envVars["XMTPD_DB_NAME_OVERRIDE"] = "xmtpd_test_dev_" + testutils.RandomStringLower(6)
     t.Logf("Starting new container")
     runContainer(t, "xmtpd_test_dev", "xmtp/xmtpd:dev", envVars)
 }
🧰 Tools
🪛 GitHub Actions: Test

[fatal] 181-181: Failed to connect to database


159-175: ⚠️ Potential issue

Replace hardcoded timeout with configurable wait time.

The 5-second timeout is hardcoded and might not be sufficient for all environments, especially given the database connection failures in the pipeline.

Apply this diff to improve timeout handling:

+const (
+    defaultStartupTimeout = 30 * time.Second
+    healthCheckInterval  = time.Second
+)
+
-    timeout := time.After(5 * time.Second)
+    timeout := time.After(defaultStartupTimeout)
+    ticker := time.NewTicker(healthCheckInterval)
+    defer ticker.Stop()
 
     for {
         select {
         case line, ok := <-logChan:
             if !ok {
                 t.Fatalf("Log stream closed before finding target log")
             }
-            t.Logf(line)
+            t.Log(line) // Fix staticcheck warning about printf-style function
             if strings.Contains(line, "replication.api\tserving grpc") {
                 t.Logf("Service started successfully")
                 return
             }
         case <-timeout:
-            t.Fatalf("Timeout: 'replication.api\tserving grpc' not found in logs within 5 seconds")
+            t.Fatalf("Timeout: 'replication.api\tserving grpc' not found in logs within %v", defaultStartupTimeout)
         }
     }
🧰 Tools
🪛 golangci-lint (1.62.2)

167-167: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go

[failure] 167-167:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)

🧹 Nitpick comments (1)
pkg/upgrade/upgrade_test.go (1)

78-87: Remove unused function.

The dockerLogs function is defined but never used in the codebase.

Either remove this unused function or utilize it in the test logic, perhaps for debugging purposes when container startup fails.

🧰 Tools
🪛 golangci-lint (1.62.2)

78-78: func dockerLogs is unused

(unused)

🪛 GitHub Check: Lint-Go

[failure] 78-78:
func dockerLogs is unused (unused)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ad79a1 and c355adf.

📒 Files selected for processing (4)
  • dev/docker/down (1 hunks)
  • pkg/testutils/store.go (2 hunks)
  • pkg/upgrade/scripts/load_env.sh (1 hunks)
  • pkg/upgrade/upgrade_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • dev/docker/down
  • pkg/testutils/store.go
🧰 Additional context used
🪛 Shellcheck (0.10.0)
pkg/upgrade/scripts/load_env.sh

[warning] 22-22: ShellCheck can't follow non-constant source. Use a directive to specify location.

(SC1090)

🪛 golangci-lint (1.62.2)
pkg/upgrade/upgrade_test.go

78-78: func dockerLogs is unused

(unused)


167-167: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go
pkg/upgrade/upgrade_test.go

[failure] 78-78:
func dockerLogs is unused (unused)


[failure] 167-167:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)

🪛 GitHub Actions: Test
pkg/upgrade/upgrade_test.go

[fatal] 181-181: Failed to connect to database

🔇 Additional comments (4)
pkg/upgrade/upgrade_test.go (3)

17-21: Add error handling for runtime.Caller.

The function ignores potential errors from runtime.Caller. This could lead to silent failures if the call stack information is unavailable.

Apply this diff to improve error handling:

 func getScriptPath(scriptName string) string {
-    _, filename, _, _ := runtime.Caller(0)
+    _, filename, _, ok := runtime.Caller(0)
+    if !ok {
+        panic("failed to get caller information")
+    }
     baseDir := filepath.Dir(filename)
     return filepath.Join(baseDir, scriptName)
 }

98-124: Improve resource cleanup in streamDockerLogs.

The function could leak resources if the goroutine is not properly cleaned up. Also, consider adding context for timeout control.

Apply this diff to improve resource management:

-func streamDockerLogs(containerName string) (chan string, func(), error) {
+func streamDockerLogs(ctx context.Context, containerName string) (chan string, func(), error) {
     logsCmd := exec.Command("docker", "logs", "-f", containerName)
     stdoutPipe, err := logsCmd.StdoutPipe()
     if err != nil {
         return nil, nil, err
     }
 
     err = logsCmd.Start()
     if err != nil {
+        stdoutPipe.Close()
         return nil, nil, err
     }
 
     logChan := make(chan string)
     go func() {
+        defer close(logChan)
+        defer stdoutPipe.Close()
         scanner := bufio.NewScanner(stdoutPipe)
         for scanner.Scan() {
+            select {
+            case <-ctx.Done():
+                return
+            default:
                 logChan <- scanner.Text()
+            }
         }
-        close(logChan)
     }()
 
     cancelFunc := func() {
         _ = logsCmd.Process.Kill()
+        _ = logsCmd.Wait()
     }
 
     return logChan, cancelFunc, nil
 }

137-137: Add error handling for container cleanup.

The container cleanup silently ignores errors, which could leave orphaned containers.

Apply this diff:

-_ = dockerRmc(containerName)
+if err := dockerRmc(containerName); err != nil {
+  t.Logf("Failed to remove container %s: %v", containerName, err)
+}
pkg/upgrade/scripts/load_env.sh (1)

24-33: Add Docker-specific environment variables.

Given that the PR objectives require Docker-based testing, ensure that Docker-specific environment variables are included in the validation.

Apply this diff to add Docker environment validation:

 # a subset of all of them
 REQUIRED_VARS=(
   "XMTPD_SIGNER_PRIVATE_KEY"
   "XMTPD_DB_WRITER_CONNECTION_STRING"
   "XMTPD_CONTRACTS_RPC_URL"
+  # Docker-specific variables
+  "DOCKER_HOST"
+  "DOCKER_TLS_VERIFY"
+  "DOCKER_CERT_PATH"
 )

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/upgrade/upgrade_test.go (2)

176-176: Use t.Log instead of t.Logf for simple strings.

When logging a string without formatting, use t.Log instead of t.Logf.

Apply this diff:

-t.Logf(line)
+t.Log(line)
🧰 Tools
🪛 golangci-lint (1.62.2)

176-176: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go

[failure] 176-176:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)


87-96: Remove unused dockerLogs function.

The dockerLogs function is defined but never used in the codebase.

Consider removing this unused function or documenting why it's kept for future use.

🧰 Tools
🪛 golangci-lint (1.62.2)

87-87: func dockerLogs is unused

(unused)

🪛 GitHub Check: Lint-Go

[failure] 87-87:
func dockerLogs is unused (unused)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c355adf and 993ca91.

📒 Files selected for processing (3)
  • dev/docker/Dockerfile (1 hunks)
  • dev/docker/build (1 hunks)
  • pkg/upgrade/upgrade_test.go (1 hunks)
🧰 Additional context used
🪛 golangci-lint (1.62.2)
pkg/upgrade/upgrade_test.go

87-87: func dockerLogs is unused

(unused)


176-176: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go
pkg/upgrade/upgrade_test.go

[failure] 87-87:
func dockerLogs is unused (unused)


[failure] 176-176:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Test (Node)
🔇 Additional comments (6)
pkg/upgrade/upgrade_test.go (5)

26-30: Add error handling for runtime.Caller.

The function ignores potential errors from runtime.Caller which could lead to silent failures.


107-133: Improve resource cleanup in streamDockerLogs.

The function could leak resources if the goroutine is not properly cleaned up.


146-146: Add error handling for container cleanup.

The container cleanup silently ignores errors, which could leave orphaned containers.


168-184: Replace hardcoded timeout with configurable wait time.

The 5-second timeout is hardcoded and might not be sufficient for all environments.

🧰 Tools
🪛 golangci-lint (1.62.2)

176-176: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go

[failure] 176-176:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)


193-194:

❓ Verification inconclusive

Verify image compatibility.

The test uses different image registries for old and new containers:

  • Old: ghcr.io/xmtp/xmtpd:0.1.4
  • New: xmtp/xmtpd:dev

Ensure that the image registries and tags are correct and accessible.


🏁 Script executed:

#!/bin/bash
# Check if both images are accessible
docker pull ghcr.io/xmtp/xmtpd:0.1.4
docker pull xmtp/xmtpd:dev

Length of output: 261


Image Compatibility Check: Manual Verification Required

The automated docker pull commands could not be executed because the docker command isn’t available in the current environment. This means the check for image accessibility wasn’t conclusive. Please verify manually (in an environment with Docker installed) that both images—ghcr.io/xmtp/xmtpd:0.1.4 (old) and xmtp/xmtpd:dev (new)—are correct, accessible, and compatible.

dev/docker/build (1)

5-5: LGTM! Using GitHub Container Registry.

The change to use ghcr.io as the default registry aligns with the upgrade test requirements and improves consistency.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/upgrade/upgrade_test.go (1)

176-176: Fix log formatting.

Use t.Log instead of t.Logf for non-formatted strings.

Apply this diff:

-t.Logf(line)
+t.Log(line)
🧰 Tools
🪛 golangci-lint (1.62.2)

176-176: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go

[failure] 176-176:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 993ca91 and 5ff4c4f.

📒 Files selected for processing (4)
  • dev/docker/Dockerfile (1 hunks)
  • dev/docker/Dockerfile-cli (1 hunks)
  • dev/docker/build (1 hunks)
  • pkg/upgrade/upgrade_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dev/docker/Dockerfile
🧰 Additional context used
🪛 golangci-lint (1.62.2)
pkg/upgrade/upgrade_test.go

18-18: const testFlag is unused

(unused)


20-20: func skipIfNotEnabled is unused

(unused)


87-87: func dockerLogs is unused

(unused)


176-176: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go
pkg/upgrade/upgrade_test.go

[failure] 18-18:
const testFlag is unused (unused)


[failure] 20-20:
func skipIfNotEnabled is unused (unused)


[failure] 87-87:
func dockerLogs is unused (unused)


[failure] 176-176:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Test (Node)
🔇 Additional comments (6)
dev/docker/build (2)

1-3: LGTM! Improved script robustness.

The switch to bash with strict error handling using set -euo pipefail enhances script reliability.


5-8: LGTM! Added error handling function.

The error function provides consistent error reporting and exit behavior.

dev/docker/Dockerfile-cli (1)

10-12: LGTM! Improved Docker layer caching.

Downloading Go modules before copying application code optimizes build caching and reduces build times.

pkg/upgrade/upgrade_test.go (3)

26-30: Add error handling for runtime.Caller.

The function ignores potential errors from runtime.Caller.

Apply this diff:

 func getScriptPath(scriptName string) string {
-    _, filename, _, _ := runtime.Caller(0)
+    _, filename, _, ok := runtime.Caller(0)
+    if !ok {
+        panic("failed to get caller information")
+    }
     baseDir := filepath.Dir(filename)
     return filepath.Join(baseDir, scriptName)
 }

146-146: Add error handling for container cleanup.

The container cleanup silently ignores errors.

Apply this diff:

-_ = dockerRmc(containerName)
+if err := dockerRmc(containerName); err != nil {
+  t.Logf("Failed to remove container %s: %v", containerName, err)
+}

168-184: Replace hardcoded timeout with configurable wait time.

The 5-second timeout is hardcoded and might not be sufficient.

Apply this diff:

+const (
+    defaultStartupTimeout = 30 * time.Second
+    healthCheckInterval  = time.Second
+)
+
-    timeout := time.After(5 * time.Second)
+    timeout := time.After(defaultStartupTimeout)
+    ticker := time.NewTicker(healthCheckInterval)
+    defer ticker.Stop()
🧰 Tools
🪛 golangci-lint (1.62.2)

176-176: printf: non-constant format string in call to (*testing.common).Logf

(govet)

🪛 GitHub Check: Lint-Go

[failure] 176-176:
SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/upgrade/upgrade_test.go (2)

163-179: 🛠️ Refactor suggestion

Replace hardcoded timeout with configurable wait time.

The 5-second timeout is hardcoded and might not be sufficient for all environments. Consider making it configurable and implementing proper container health checks.

Apply this diff:

+const (
+    defaultStartupTimeout = 30 * time.Second
+    healthCheckInterval  = time.Second
+)
+
-    timeout := time.After(5 * time.Second)
+    timeout := time.After(defaultStartupTimeout)
+    ticker := time.NewTicker(healthCheckInterval)
+    defer ticker.Stop()

     for {
         select {
         case line, ok := <-logChan:
             if !ok {
                 t.Fatalf("Log stream closed before finding target log")
             }
             t.Log(line)
             if strings.Contains(line, "replication.api\tserving grpc") {
                 t.Logf("Service started successfully")
                 return
             }
         case <-timeout:
-            t.Fatalf("Timeout: 'replication.api\tserving grpc' not found in logs within 5 seconds")
+            t.Fatalf("Timeout: 'replication.api\tserving grpc' not found in logs within %v", defaultStartupTimeout)
         }
     }

210-217: 🛠️ Refactor suggestion

Fix container startup order and environment sharing.

The test could fail due to container startup issues and database conflicts. Consider:

  1. Running containers sequentially with proper cleanup
  2. Verifying container health before proceeding
  3. Using unique database names for each container

Apply this diff:

 func TestUpgradeFrom014(t *testing.T) {
+    t.Parallel()
+    
+    // First container environment
     envVars := constructVariables(t)
     t.Logf("Starting old container")
     runContainer(t, "xmtpd_test_014", "ghcr.io/xmtp/xmtpd:0.1.4", envVars)
 
+    // Second container needs a new database
+    envVars = constructVariables(t)
     t.Logf("Starting new container")
     runContainer(t, "xmtpd_test_dev", "xmtp/xmtpd:dev", envVars)
 }
🧹 Nitpick comments (1)
.github/workflows/test.yml (1)

45-50: Add timeout and test filtering options.

Consider adding a timeout and test filtering options to prevent long-running tests from blocking the workflow.

Apply this diff:

       - name: Run Upgrade Tests
         run: |
           export GOPATH="${HOME}/go/"
           export PATH="${PATH}:${GOPATH}/bin"
           export ENABLE_UPGRADE_TESTS=1
-          go test github.com/xmtp/xmtpd/pkg/upgrade -v
+          go test github.com/xmtp/xmtpd/pkg/upgrade -v -timeout 10m -run "^TestUpgrade"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ff4c4f and 2c253c6.

📒 Files selected for processing (2)
  • .github/workflows/test.yml (1 hunks)
  • pkg/upgrade/upgrade_test.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Push Docker Images to GitHub Packages (xmtpd)
  • GitHub Check: Build pre-baked anvil-xmtpd
  • GitHub Check: Test (Node)
🔇 Additional comments (3)
pkg/upgrade/upgrade_test.go (3)

27-31: Add error handling for runtime.Caller.

The function ignores potential errors from runtime.Caller. This could lead to silent failures if the call stack information is unavailable.


97-123: Improve resource cleanup in streamDockerLogs.

The function could leak resources if the goroutine is not properly cleaned up. Also, consider adding context for timeout control.


136-136: Add error handling for container cleanup.

The container cleanup silently ignores errors, which could leave orphaned containers.

@mkysel mkysel marked this pull request as ready for review February 13, 2025 16:51
@mkysel mkysel requested a review from a team as a code owner February 13, 2025 16:51
@mkysel
Copy link
Collaborator Author

mkysel commented Feb 13, 2025

Setting up before all tests...
=== RUN   TestUpgradeFrom014
    upgrade_test.go:212: Starting old container
    upgrade_test.go:171: 2025-02-13T16:52:54.605Z       INFO    replication     Version: v0.1.4-d91129501d701a56825b54ad95054b607308ac2d
    upgrade_test.go:171: 2025-02-13T16:52:54.735Z       INFO    replication     Registrant identified   {"nodeId": 100}
    upgrade_test.go:171: 2025-02-13T16:52:54.735Z       INFO    replication     Connecting to mls validation service    {"url": "http://host.docker.internal:60051", "target": "host.docker.internal:60051"}
    upgrade_test.go:171: 2025-02-13T16:52:54.740Z       INFO    replication     Indexer service enabled
    upgrade_test.go:171: 2025-02-13T16:52:54.742Z       INFO    replication.api Creating API server
    upgrade_test.go:171: 2025-02-13T16:52:54.744Z       INFO    replication.subscribeWorker     Starting subscribe worker
    upgrade_test.go:171: 2025-02-13T16:52:54.746Z       INFO    replication     Replication service enabled
    upgrade_test.go:171: 2025-02-13T16:52:54.746Z       INFO    replication     API server started      {"port": 5050}
    upgrade_test.go:171: 2025-02-13T16:52:54.748Z       INFO    replication.api serving grpc    {"address": "[::]:5050"}
    upgrade_test.go:173: Service started successfully
    upgrade_test.go:215: Starting new container
    upgrade_test.go:171: 2025-02-13T16:52:54.985Z       INFO    replication     Version: v0.1.3-81-g2c253c6
    upgrade_test.go:171: 2025-02-13T16:52:54.991Z       INFO    replication     Successfully connected to DB    {"namespace": "testupgradefrom014_ybxdbn"}
    upgrade_test.go:171: 2025-02-13T16:52:55.016Z       INFO    replication     Starting server with blockchain nonce: 0
    upgrade_test.go:171: 2025-02-13T16:52:55.018Z       INFO    replication     Registrant identified   {"nodeId": 100}
    upgrade_test.go:171: 2025-02-13T16:52:55.018Z       INFO    replication     Connecting to mls validation service    {"url": "http://host.docker.internal:60051", "target": "host.docker.internal:60051"}
    upgrade_test.go:171: 2025-02-13T16:52:55.019Z       INFO    replication     Indexer service enabled
    upgrade_test.go:171: 2025-02-13T16:52:55.019Z       INFO    replication.api Creating API server
    upgrade_test.go:171: 2025-02-13T16:52:55.019Z       INFO    replication.subscribeWorker     Starting subscribe worker
    upgrade_test.go:171: 2025-02-13T16:52:55.022Z       INFO    replication     Replication service enabled
    upgrade_test.go:171: 2025-02-13T16:52:55.022Z       INFO    replication     Metadata service enabled
    upgrade_test.go:171: 2025-02-13T16:52:55.022Z       INFO    replication     API server started      {"port": 5050}
    upgrade_test.go:171: 2025-02-13T16:52:55.022Z       INFO    replication.api serving grpc    {"address": "[::]:5050"}
    upgrade_test.go:173: Service started successfully
--- PASS: TestUpgradeFrom014 (0.81s)
PASS
ok      github.com/xmtp/xmtpd/pkg/upgrade       13.215s

Upgrade works.

We will have to make sure it works with payloads in the DB as well.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
pkg/upgrade/docker_utils_test.go (3)

29-33: ⚠️ Potential issue

Add error handling for runtime.Caller.

The function ignores potential errors from runtime.Caller which could lead to silent failures.

Apply this diff:

 func getScriptPath(scriptName string) string {
-	_, filename, _, _ := runtime.Caller(0)
+	_, filename, _, ok := runtime.Caller(0)
+	if !ok {
+		panic("failed to get caller information")
+	}
 	baseDir := filepath.Dir(filename)
 	return filepath.Join(baseDir, scriptName)
 }

35-61: ⚠️ Potential issue

Add error handling for scanner errors.

The function should check for scanner errors after the scanning loop completes.

Apply this diff:

 func loadEnvFromShell() (map[string]string, error) {
 	scriptPath := getScriptPath("./scripts/load_env.sh")
 	cmd := exec.Command(scriptPath)
 	var outBuf, errBuf bytes.Buffer
 	cmd.Stdout = &outBuf
 	cmd.Stderr = &errBuf
 
 	err := cmd.Run()
 	if err != nil {
 		return nil, fmt.Errorf(
 			"error loading env via shell script: %v\nError: %s",
 			err,
 			errBuf.String(),
 		)
 	}
 
 	envMap := make(map[string]string)
 	scanner := bufio.NewScanner(&outBuf)
 	for scanner.Scan() {
 		line := scanner.Text()
 		parts := strings.SplitN(line, "=", 2)
 		if len(parts) == 2 {
 			envMap[parts[0]] = parts[1]
 		}
 	}
+	if err := scanner.Err(); err != nil {
+		return nil, fmt.Errorf("error scanning environment variables: %v", err)
+	}
 	return envMap, nil
 }

99-125: ⚠️ Potential issue

Improve resource cleanup in streamDockerLogs.

The function could leak resources if the goroutine is not properly cleaned up.

Apply this diff:

-func streamDockerLogs(containerName string) (chan string, func(), error) {
+func streamDockerLogs(ctx context.Context, containerName string) (chan string, func(), error) {
 	logsCmd := exec.Command("docker", "logs", "-f", containerName)
 	stdoutPipe, err := logsCmd.StdoutPipe()
 	if err != nil {
 		return nil, nil, err
 	}
 
 	err = logsCmd.Start()
 	if err != nil {
+		stdoutPipe.Close()
 		return nil, nil, err
 	}
 
 	logChan := make(chan string)
 	go func() {
+		defer close(logChan)
+		defer stdoutPipe.Close()
 		scanner := bufio.NewScanner(stdoutPipe)
 		for scanner.Scan() {
+			select {
+			case <-ctx.Done():
+				return
+			default:
 				logChan <- scanner.Text()
+			}
 		}
-		close(logChan)
 	}()
 
 	cancelFunc := func() {
 		_ = logsCmd.Process.Kill()
+		_ = logsCmd.Wait()
 	}
 
 	return logChan, cancelFunc, nil
 }
🧹 Nitpick comments (2)
pkg/upgrade/docker_utils_test.go (2)

184-208: Add context to buildDevImage function.

The function should accept a context parameter for proper timeout control.

Apply this diff:

-func buildDevImage() error {
+func buildDevImage(ctx context.Context) error {
 	scriptPath := getScriptPath("../../dev/docker/build")
 
-	// Set a 5-minute timeout
-	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
-	defer cancel()
-
 	cmd := exec.CommandContext(ctx, scriptPath)
 
 	var outBuf, errBuf bytes.Buffer

210-236: Add context to dockerPull function.

The function should accept a context parameter for proper timeout control.

Apply this diff:

-func dockerPull(imageName string) error {
+func dockerPull(ctx context.Context, imageName string) error {
-	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
-	defer cancel()
-
 	cmd := exec.CommandContext(ctx, "docker", "pull", imageName)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c253c6 and 0384bc9.

📒 Files selected for processing (4)
  • .dockerignore (1 hunks)
  • pkg/upgrade/docker_utils_test.go (1 hunks)
  • pkg/upgrade/main_test.go (1 hunks)
  • pkg/upgrade/upgrade_test.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .dockerignore
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Test (Node)

Copy link
Collaborator

@fbac fbac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. As a solution to have a running upgrade test looks good, my recommendation would be further refinement using testcontainers-go.

return logChan, cancelFunc, nil
}

func runContainer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docker_utils are correct for a first approach, but we could polish and simplify the test by using testcontainers-go

@mkysel mkysel merged commit 808ae4e into main Feb 14, 2025
8 checks passed
@mkysel mkysel deleted the mkysel/upgrade branch February 14, 2025 15:49
@mkysel
Copy link
Collaborator Author

mkysel commented Feb 14, 2025

Relates to #392

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants