Skip to content

feat: record the migration events in metasrv #6579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

zyy17
Copy link
Collaborator

@zyy17 zyy17 commented Jul 23, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

  1. Add the event recorder in common/procedure with ProcedureEvent;
  2. Implement the event recorder in metasrv and collect the region migration events;

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@github-actions github-actions bot added size/S docs-not-required This change does not impact docs. labels Jul 23, 2025
@zyy17 zyy17 mentioned this pull request Jul 23, 2025
5 tasks
@zyy17 zyy17 force-pushed the feat/migration-events branch from dec58f7 to 627eafc Compare July 23, 2025 22:57
@github-actions github-actions bot added size/M and removed size/S labels Jul 24, 2025
@zyy17 zyy17 force-pushed the feat/migration-events branch from 06691e7 to 5ea281d Compare July 24, 2025 04:29
@zyy17 zyy17 force-pushed the feat/migration-events branch from 5ea281d to 13e2ae7 Compare July 24, 2025 06:10
@zyy17 zyy17 marked this pull request as ready for review July 24, 2025 07:02
@zyy17 zyy17 requested review from MichaelScofield and a team as code owners July 24, 2025 07:02
@zyy17 zyy17 requested review from WenyXu and Copilot July 24, 2025 07:02
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements event recording functionality for region migration procedures in the metasrv component. The feature tracks important events during the migration lifecycle and persists them to a system table for observability and debugging purposes.

  • Adds event recording infrastructure to capture region migration state changes
  • Integrates event recording into the procedure manager to track migration lifecycle events
  • Adds test verification to ensure migration events are properly recorded in the system table

Reviewed Changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests-integration/tests/region_migration.rs Adds comprehensive test verification for region migration events system table
tests-integration/src/cluster.rs Modifies cluster builder to support optional frontend server startup for test environments
src/meta-srv/src/metasrv/builder.rs Integrates event recorder into metasrv builder and procedure manager initialization
src/meta-srv/src/events/region_migration_event.rs Implements region migration event structure and serialization logic
src/common/procedure/src/local.rs Adds event recording capability to procedure execution lifecycle
src/common/event-recorder/src/recorder.rs Makes json_payload method optional with default empty implementation
Comments suppressed due to low confidence (4)

tests-integration/tests/region_migration.rs:316

  • Using DEFAULT_FLUSH_INTERVAL_SECONDS * 2 directly as a Duration is incorrect. DEFAULT_FLUSH_INTERVAL_SECONDS appears to be a numeric value, but tokio::time::sleep expects a Duration. This should be Duration::from_secs(DEFAULT_FLUSH_INTERVAL_SECONDS * 2).
    run_sql(

tests-integration/tests/region_migration.rs:342

  • [nitpick] The variable name procedure_id_for_closure is unnecessarily verbose and unclear. Consider renaming it to something more concise like procedure_id_clone or simply move the clone operation inline where it's used.
    let procedure_id_for_closure = procedure_id.clone();

tests-integration/tests/region_migration.rs:483

  • [nitpick] The variable name procedure_id_for_closure is unnecessarily verbose and unclear. Consider renaming it to something more concise like procedure_id_clone or simply move the clone operation inline where it's used.
    let procedure_id_for_closure = procedure_id.clone();

src/meta-srv/src/events/region_migration_event.rs:82

  • [nitpick] The variant name Done is ambiguous compared to other variants like Running and Failed. Consider renaming to Finished or Completed for better clarity and consistency with the comment.
    Done,

@zyy17 zyy17 force-pushed the feat/migration-events branch 3 times, most recently from 9d2bfa5 to 3e5feeb Compare July 24, 2025 08:16
@zyy17 zyy17 force-pushed the feat/migration-events branch from 3e5feeb to 66fb584 Compare July 24, 2025 08:26
@zyy17 zyy17 force-pushed the feat/migration-events branch from ed40050 to 88b1361 Compare July 24, 2025 20:35
@zyy17 zyy17 force-pushed the feat/migration-events branch from b0acdb1 to facf33b Compare July 24, 2025 21:31
@zyy17 zyy17 enabled auto-merge July 24, 2025 21:31
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Comment on lines +199 to +202
(RegionMigrationStatus::Failed, Some(error.to_string()))
}
ProcedureState::Retrying { error } => {
(RegionMigrationStatus::Retrying, Some(error.to_string()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use debug fmt?

if let Some(event_recorder) = &self.event_recorder {
event_recorder.record(Box::new(ProcedureEvent::new(
self.id,
self.procedure_dump_data.clone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The procedure generates different dump data at each step.

Comment on lines +54 to +69
async fn handle(&self, events: &[Box<dyn Event>]) -> Result<()> {
let region_migration_events: Box<Vec<Box<dyn Event>>> = Box::new(
events
.iter()
.filter_map(|event| {
event
.as_any()
.downcast_ref::<ProcedureEvent>()
.and_then(|procedure_event| {
RegionMigrationEvent::new_from_procedure_event(procedure_event)
.ok()
.flatten()
.map(|event| Box::new(event) as Box<dyn Event>)
})
})
.collect(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a method to the Procedure trait to expose recordable metadata? And we can build the Event directly in the recorder without having to convert it to another Event implementation in the handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants