-
Notifications
You must be signed in to change notification settings - Fork 8
feat: Add job cancellation. #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@sitaowang1998 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 4 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
## Walkthrough
This update adds job and task cancellation features, including schema changes, storage API extensions, worker cancellation checks, and related tests. It also introduces thread-safe `ExecutorHandle` management and updates task executor state methods.
## Changes
| File(s) | Summary |
|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| src/spider/CMakeLists.txt, src/spider/worker/ExecutorHandle.cpp, src/spider/worker/ExecutorHandle.hpp | Adds `ExecutorHandle` class and integrates it into the build process. |
| src/spider/client/Job.hpp | Implements `cancel()` and `get_error()` methods for job cancellation and error retrieval. |
| src/spider/client/TaskContext.cpp | Adds `abort()` method to cancel a job from within a task, with immediate process termination. |
| src/spider/storage/MetadataStorage.hpp, src/spider/storage/mysql/MySqlStorage.hpp | Extends storage API with job/task cancellation and error message retrieval methods. |
| src/spider/storage/mysql/MySqlStorage.cpp | Implements job/task cancellation and error retrieval in MySQL storage; updates `task_fail` logic. |
| src/spider/storage/mysql/mysql_stmt.hpp, tools/scripts/storage/init_db.sql | Adds new `job_errors` table and corresponding SQL schema for storing job error messages. |
| src/spider/worker/TaskExecutor.cpp, src/spider/worker/TaskExecutor.hpp | Renames task state query methods; adds `cancelled()` method. |
| src/spider/worker/worker.cpp | Implements cooperative task cancellation in the worker, integrates with heartbeat, and updates control flow. |
| tests/CMakeLists.txt, tests/client/cancel-test.cpp, tests/integration/test_cancel.py | Adds new client and integration tests for cancellation; updates test build configuration. |
| tests/storage/test-MetadataStorage.cpp | Adds unit tests for job cancellation and cancellation by task. |
| tests/worker/worker-test.cpp, tests/worker/worker-test.hpp | Adds test tasks for sleep and abort scenarios; updates test task declarations. |
| tests/worker/test-TaskExecutor.cpp | Updates test assertions to use new task executor state method names. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant Client
participant Driver
participant MetadataStorage
participant Worker
participant ExecutorHandle
participant TaskExecutor
Client->>Driver: submit_job()
Driver->>MetadataStorage: create_job()
MetadataStorage-->>Driver: job_id
Driver-->>Client: Job
Client->>Job: cancel()
Job->>MetadataStorage: cancel_job(job_id, message)
MetadataStorage-->>Job: result
loop Heartbeat
Worker->>ExecutorHandle: get_task_id()
alt Task running
Worker->>MetadataStorage: get_task_state(task_id)
MetadataStorage-->>Worker: state
alt state == Cancelled
Worker->>ExecutorHandle: get_executor()
Worker->>TaskExecutor: cancel()
end
end
end
Worker->>TaskExecutor: run()
alt Task cancelled
TaskExecutor-->>Worker: is_cancelled() == true
Worker-->>MetadataStorage: update task/job state to Cancelled
else Task completes
Worker-->>MetadataStorage: update task/job state to Succeeded/Failed
end Suggested reviewers
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/spider/storage/mysql/MySqlStorage.cpp (2)
1056-1083
: Silent success & missing guard incancel_job()
(see earlier review)This implementation still:
- Ignores the affected-row count of the
UPDATE
statements – an invalidjob_id
will “succeed” silently.- Allows multiple identical rows to accumulate in
job_errors
.- Rolls back only on SQL exceptions, not on logical errors.
Please revisit the earlier suggestion (copied below) and apply it:
int32_t updated = task_statement->executeUpdate(); if (updated == 0) { static_cast<MySqlConnection&>(conn)->rollback(); return StorageErr{StorageErrType::KeyNotFoundErr, fmt::format("No running job with id {}", boost::uuids::to_string(id))}; } updated = job_statement->executeUpdate(); if (updated == 0) { static_cast<MySqlConnection&>(conn)->rollback(); return StorageErr{StorageErrType::KeyNotFoundErr, fmt::format("No job with id {}", boost::uuids::to_string(id))}; } -"INSERT INTO `job_errors` ..." +"INSERT IGNORE INTO `job_errors` ..."
1085-1142
: Same robustness issues incancel_job_by_task()
The method duplicates the problems outlined for
cancel_job()
(lack of row-count checks and duplicatejob_errors
entries). Apply the same guards here for consistency and reliability.
🧹 Nitpick comments (2)
src/spider/worker/ExecutorHandle.cpp (1)
31-35
:set()
does not guarantee visibility of paired members
m_task_id
andm_executor
are set individually. A racing reader could observe a newtask_id
paired with the previous executor (or vice-versa). Pack them into a struct and assign once, or updatem_executor
first, thenm_task_id
(the latter being the validity flag).src/spider/worker/ExecutorHandle.hpp (1)
18-22
: Accessor methods should beconst
Both
get_task_id()
andget_executor()
do not modify the object and can safely be markedconst
, improving usability and conveying intent.-[[nodiscard]] auto get_task_id() -> std::optional<boost::uuids::uuid>; -[[nodiscard]] auto get_executor() -> TaskExecutor*; +[[nodiscard]] auto get_task_id() const -> std::optional<boost::uuids::uuid>; +[[nodiscard]] auto get_executor() const -> TaskExecutor*;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
src/spider/client/Job.hpp
(2 hunks)src/spider/storage/MetadataStorage.hpp
(2 hunks)src/spider/storage/mysql/MySqlStorage.cpp
(3 hunks)src/spider/storage/mysql/MySqlStorage.hpp
(2 hunks)src/spider/worker/ExecutorHandle.cpp
(1 hunks)src/spider/worker/ExecutorHandle.hpp
(1 hunks)src/spider/worker/worker.cpp
(8 hunks)tests/CMakeLists.txt
(2 hunks)tests/client/cancel-test.cpp
(1 hunks)tests/storage/test-MetadataStorage.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- tests/CMakeLists.txt
- src/spider/storage/mysql/MySqlStorage.hpp
- tests/client/cancel-test.cpp
- src/spider/client/Job.hpp
- src/spider/storage/MetadataStorage.hpp
- tests/storage/test-MetadataStorage.cpp
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/spider/worker/ExecutorHandle.cpp (1)
src/spider/worker/ExecutorHandle.hpp (1)
task_id
(21-21)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non-storage-unit-tests (ubuntu-22.04)
- GitHub Check: lint
- GitHub Check: non-storage-unit-tests (ubuntu-24.04)
🔇 Additional comments (8)
src/spider/storage/mysql/MySqlStorage.cpp (2)
1520-1551
: Map unknown task state to error, notPending
string_to_task_state()
returnsPending
for unknown strings. Thus, if the DB contains a typo,get_task_state()
will silently reportPending
instead of signalling an error.-if (state_str.empty()) { - *state = spider::core::TaskState::Pending; -} else { - // ... -} +if (state_str.empty()) { + static_cast<MySqlConnection&>(conn)->rollback(); + return StorageErr{StorageErrType::OtherErr, + fmt::format("Unknown task state '{}'", state_str)}; +}
1836-1845
: Good defensive update intask_fail()
The added row-count check prevents false failure propagation when the task is no longer
running
. Nice catch!src/spider/worker/worker.cpp (6)
50-50
: Good addition of ExecutorHandle inclusion for managing task cancellation.This inclusion properly supports the new cancellation mechanism introduced in this PR.
127-158
: Well-implemented cancellation check with appropriate error handling.The
check_task_cancel
function properly implements the cooperative cancellation mechanism by:
- Retrieving the current task ID from the executor handle
- Checking the task state from the metadata store
- Triggering cancellation when appropriate
The function contains thorough error handling and follows a clear, logical flow.
164-164
: Properly integrated cancellation check into the heartbeat loop.Adding the
executor_handle
parameter and callingcheck_task_cancel
in the heartbeat loop enables asynchronous detection of cancellation requests. This is an efficient approach that leverages the existing heartbeat mechanism.Also applies to: 185-186
326-326
: Improved method naming for better readability.Renaming
succeed()
tois_succeeded()
follows better naming conventions for boolean getter methods and improves code clarity.
388-388
: Well-implemented task cancellation handling in the task loop.The implementation:
- Properly tracks the current task with
executor_handle.set(task_id, &executor)
- Clears the handle after execution with
executor_handle.clear()
- Correctly handles cancellation cases with detailed comments explaining the behavior
The code cleanly integrates cancellation into the existing task execution flow without disrupting the normal path.
Also applies to: 430-431, 443-443, 446-453
558-559
: Properly initialized and shared executor_handle across threads.The ExecutorHandle is correctly instantiated in the main thread and passed to both the heartbeat and task threads, allowing for coordinated cancellation detection and handling.
Also applies to: 566-566, 574-574
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/spider/client/TaskContext.cpp (1)
74-87
: Address the previous review feedback about exit code and documentation.The implementation still has the same issues identified in the previous review:
- Using generic exit code
1
instead of a specific cancellation code- Lack of documentation about the immediate termination behaviour
- Should use a dedicated exit code like
130
for cancellationPlease refer to the previous review comment for the detailed solution.
src/spider/storage/mysql/MySqlStorage.cpp (2)
1056-1083
:⚠️ Potential issueValidate target job & guard against silent failures
The
cancel_job()
method executes UPDATE statements without checking affected rows. If an invalidjob_id
is supplied, the function silently "succeeds", leaving the caller unaware that no job was actually cancelled. Additionally, unlikecancel_job_by_task()
, this method doesn't insert intojob_errors
, creating an inconsistency in error tracking.
1085-1142
:⚠️ Potential issueAdd UPDATE validation and prevent duplicate error entries
Similar to
cancel_job()
, this method doesn't validate the affected row count after UPDATE operations. Additionally, multiple calls with the same task could insert duplicate entries intojob_errors
.Consider:
- Checking affected rows after each UPDATE
- Using
INSERT IGNORE
orON DUPLICATE KEY UPDATE
to handle duplicate entries- task_statement->executeUpdate(); + int32_t updated = task_statement->executeUpdate(); + // Log or handle cases where no tasks were updated ... - job_statement->executeUpdate(); + updated = job_statement->executeUpdate(); + if (updated == 0) { + static_cast<MySqlConnection&>(conn)->rollback(); + return StorageErr{StorageErrType::KeyNotFoundErr, + fmt::format("No job found for task {}", boost::uuids::to_string(id))}; + } ... - "INSERT INTO `job_errors` (`job_id`, `func_name`, `message`) VALUES (?, ?, " - "?) " + "INSERT INTO `job_errors` (`job_id`, `func_name`, `message`) VALUES (?, ?, ?) " + "ON DUPLICATE KEY UPDATE `message` = VALUES(`message`)"
🧹 Nitpick comments (1)
src/spider/storage/mysql/MySqlStorage.cpp (1)
1056-1083
: Consider adding optional message parameter to cancel_job for consistencyCurrently,
cancel_job()
doesn't record any message injob_errors
, whilecancel_job_by_task()
does. This inconsistency could make it difficult to track why jobs were cancelled. Consider either:
- Adding an optional message parameter to
cancel_job()
- Always inserting a default message like "Cancelled by user" when no specific message is provided
This would ensure all cancellations are properly tracked in
job_errors
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/spider/CMakeLists.txt
(1 hunks)src/spider/client/Job.hpp
(2 hunks)src/spider/client/TaskContext.cpp
(2 hunks)src/spider/storage/MetadataStorage.hpp
(2 hunks)src/spider/storage/mysql/MySqlStorage.cpp
(3 hunks)src/spider/storage/mysql/MySqlStorage.hpp
(2 hunks)src/spider/worker/TaskExecutor.cpp
(2 hunks)src/spider/worker/TaskExecutor.hpp
(1 hunks)src/spider/worker/worker.cpp
(8 hunks)tests/CMakeLists.txt
(2 hunks)tests/storage/test-MetadataStorage.cpp
(1 hunks)tests/worker/test-TaskExecutor.cpp
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/spider/CMakeLists.txt
- tests/worker/test-TaskExecutor.cpp
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: non-storage-unit-tests (ubuntu-24.04)
- GitHub Check: non-storage-unit-tests (ubuntu-22.04)
- GitHub Check: lint
🔇 Additional comments (19)
src/spider/worker/TaskExecutor.cpp (2)
24-48
: LGTM! Consistent method naming and proper cancellation state handling.The method renames to use the "is_" prefix improve consistency and readability. The new
is_cancelled()
method follows the same thread-safe pattern as other state checking methods.
85-90
: Good defensive programming for cancellation handling.The early state check prevents unnecessary message processing when the task is no longer in an active state (Waiting or Running). This is appropriate for the cancellation feature and helps ensure clean termination.
tests/CMakeLists.txt (1)
89-99
: LGTM! Proper test executable setup.The new
cancel_test
executable follows the established pattern for test builds with appropriate library linking and dependency management.Also applies to: 114-114
src/spider/client/Job.hpp (2)
85-97
: LGTM! Consistent implementation following established patterns.The
cancel()
method properly handles storage connection acquisition and error propagation, following the same pattern as other methods in this class.
167-192
: LGTM! Proper error retrieval implementation.The
get_error()
method correctly handles both connection scenarios (null and non-nullm_conn
) and follows the established error handling pattern. The return typestd::pair<std::string, std::string>
matches the documented interface for task function name and error message.src/spider/worker/TaskExecutor.hpp (1)
158-162
: Good naming convention improvement!The renaming to
is_*
style for boolean predicate methods follows C++ best practices and makes the API more intuitive. The addition ofis_cancelled()
properly extends the state checking interface for the new cancellation feature.tests/storage/test-MetadataStorage.cpp (2)
445-518
: Well-structured job cancellation test!The test comprehensively covers the job cancellation workflow:
- Correctly verifies that completed tasks retain their success state
- Ensures pending tasks transition to cancelled state
- Validates job status update
Good test coverage for the basic cancellation scenario.
520-600
: Excellent test for task-initiated cancellation!This test effectively validates the extended cancellation functionality:
- Tests cancellation triggered by a specific task ID
- Verifies error message storage and retrieval
- Confirms the association between error message and originating task
- Maintains consistency with state transition expectations
The test provides thorough coverage of the task-based cancellation feature.
src/spider/storage/mysql/MySqlStorage.hpp (1)
74-83
: Clean interface extension for job cancellation!The new methods properly extend the storage interface to support the cancellation feature:
cancel_job
andcancel_job_by_task
provide two cancellation pathsget_job_message
enables error message retrieval with associated task contextget_task_state
adds necessary state querying capabilityThe interface design is consistent with existing patterns and supports the requirements shown in the tests.
Also applies to: 99-100
src/spider/storage/MetadataStorage.hpp (1)
75-82
: LGTM! Clean interface additions for job cancellation.The new virtual methods are well-designed and properly documented. The separation of concerns between
cancel_job
(external cancellation) andcancel_job_by_task
(internal cancellation with error message) is appropriate.Also applies to: 83-94, 95-109, 130-132
src/spider/storage/mysql/MySqlStorage.cpp (3)
1144-1172
: LGTM!Clean implementation with proper error handling for missing messages.
1520-1551
: LGTM!Straightforward getter implementation following the established pattern.
1836-1845
: Good defensive programming!The added check prevents unnecessary job state updates and handles race conditions where the task might already be in a terminal state.
src/spider/worker/worker.cpp (6)
50-50
: LGTM!The include is necessary for the new cancellation feature implementation.
127-159
: Implementation looks good with proper thread safety.The cancellation check logic is well-structured with appropriate error handling. The thread safety concern from previous reviews has been addressed through the mutex-protected
ExecutorHandle
implementation.Note: When
get_task_state
fails due to storage issues, the function returns early and the potentially cancelled task continues running. This might be the intended behavior to avoid disrupting tasks due to transient storage problems, but worth documenting this decision.
160-202
: Well-integrated cancellation check in heartbeat loop.The storage connection handling uses proper RAII with
shared_ptr
, and the cancellation check is appropriately placed after acquiring the connection. The error handling and retry logic remain intact.
326-326
: LGTM!Method rename follows the consistent
is_
prefix pattern for boolean predicates.
385-461
: Proper executor handle lifecycle management and cancellation handling.The implementation correctly:
- Sets the executor handle before task execution
- Clears it after completion to prevent stale references
- Skips result processing for cancelled tasks since the storage state is already updated
The detailed comments explaining the cancellation scenarios are helpful for maintainability.
558-579
: LGTM!The
ExecutorHandle
instance is correctly created and shared between the heartbeat and task threads, enabling proper cancellation coordination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/spider/storage/mysql/MySqlStorage.cpp (2)
1079-1106
: Address the previous review feedback on validationThe previous review feedback about validating target job and checking affected-row counts has not been implemented. The function still unconditionally executes UPDATE statements without checking if any rows were affected, allowing invalid job IDs to silently "succeed".
1108-1165
: Address the previous review feedback on validation and duplicate handlingThe previous review feedback about validating UPDATE statements and preventing duplicate error rows has not been implemented. The function still doesn't check affected-row counts for UPDATE statements and uses regular INSERT instead of INSERT IGNORE for the job_errors table.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
src/spider/CMakeLists.txt
(1 hunks)src/spider/client/Job.hpp
(2 hunks)src/spider/storage/MetadataStorage.hpp
(2 hunks)src/spider/storage/mysql/MySqlStorage.cpp
(3 hunks)src/spider/storage/mysql/MySqlStorage.hpp
(2 hunks)src/spider/worker/ExecutorHandle.hpp
(1 hunks)src/spider/worker/TaskExecutor.cpp
(2 hunks)src/spider/worker/TaskExecutor.hpp
(1 hunks)src/spider/worker/worker.cpp
(8 hunks)tests/client/cancel-test.cpp
(1 hunks)tests/integration/test_cancel.py
(1 hunks)tests/storage/test-MetadataStorage.cpp
(1 hunks)tests/worker/test-TaskExecutor.cpp
(5 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/spider/worker/ExecutorHandle.hpp
🚧 Files skipped from review as they are similar to previous changes (10)
- tests/worker/test-TaskExecutor.cpp
- src/spider/worker/TaskExecutor.cpp
- src/spider/CMakeLists.txt
- src/spider/worker/TaskExecutor.hpp
- src/spider/client/Job.hpp
- src/spider/storage/mysql/MySqlStorage.hpp
- tests/client/cancel-test.cpp
- src/spider/storage/MetadataStorage.hpp
- src/spider/worker/worker.cpp
- tests/integration/test_cancel.py
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint
- GitHub Check: non-storage-unit-tests (ubuntu-22.04)
- GitHub Check: non-storage-unit-tests (ubuntu-24.04)
🔇 Additional comments (5)
tests/storage/test-MetadataStorage.cpp (2)
451-515
: LGTM! Solid test coverage for job cancellation.This test case effectively validates the basic job cancellation functionality, correctly verifying that:
- The job status transitions to "Cancelled"
- Already finished tasks retain their "Succeed" state
- Pending/ready tasks transition to "Canceled" state
The test logic is sound and follows proper testing patterns.
517-589
: LGTM! Comprehensive test for task-initiated cancellation.This test case properly validates the advanced cancellation functionality including error message handling. The verification of
get_error_message()
functionality and correct task ID storage is excellent.src/spider/storage/mysql/MySqlStorage.cpp (3)
1167-1195
: Well-implemented query methodThe implementation follows proper patterns with appropriate error handling, transaction management, and meaningful error messages.
1543-1574
: Consistent implementation with proper error handlingThe method follows established patterns in the codebase with appropriate error handling and transaction management.
1859-1868
: Good defensive programming improvementsThe addition of state validation and early return when no rows are affected improves the method's robustness and consistency.
Description
Add job cancellation support from inside the task through
TaskContext::abort
and from user throughJob::cancel
.Add
job_errors
table to store the cancellation messages. Add storage functions to cancel job and to get cancellation messages. Add implementation of the two functions mentioned in previous paragraph in client. Add task cancellation check in worker. Add unit tests for new storage functions. Add integration tests for job cancellation.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit
New Features
job_errors
table for storing error details.Bug Fixes
Tests
Refactor
Chores