Pipeline: add support for error handling #3359

shohamazon · 2025-03-12T10:45:19Z

This PR adds support for handling errors within a pipeline.
If an error occurs during pipeline execution, failed commands will be identified and retried based on their respective retry mechanisms.

For example, commands that fail with a MOVED error will be collected, the cluster topology will be updated accordingly, and the commands will be retried on the correct node.

This PR enhances error handling for pipeline execution in cluster mode.
Previously, all multi-key commands had to be mapped to the same slot, as cluster mode required key-based routing. However, with pipelines, we can now send commands involving different keys across multiple slots, allowing for more flexible command execution.

Why This Change Is Needed

Up until now, all multi-key commands in cluster mode had to be mapped to the same slot. However, with pipeline support, commands involving different keys can now be executed together, even if they belong to different slots. This introduces a challenge: since a sub-pipeline (which contains all commands sent to a specific node) can fail as a whole, an error like MOVED might affect the entire sub-pipeline, even if only a single command within it caused the issue. As a result, the previous approach of retrying the whole sub-pipeline was incorrect, as unaffected commands were unnecessarily retried.

To address this, we now implement separate error handling specifically for pipelines, ensuring that only the failed commands are retried while maintaining correctness and efficiency.

Issue link

This Pull Request is linked to issue (URL): [REPLACE ME]

Checklist

Before submitting the PR make sure the following are checked:

This Pull Request is related to one issue.
Commit message has a detailed description of what changed and why.
Tests are added or updated.
CHANGELOG.md and documentation files are updated.
Destination branch is correct - main or release
Create merge commit if merging release branch into main, squash otherwise.

Signed-off-by: Shoham Elias <[email protected]>

Yury-Fridlyand · 2025-03-12T19:21:57Z

glide-core/redis-rs/redis/src/aio/multiplexed_connection.rs

-        let value = result.and_then(|v| v.extract_error())?;
+        let value = result?;
        match value {
            Value::Array(mut values) => {


match result { Ok(Value::Array(mut values) => { ...

why is that? we can raise an error if value is a redis-error (like a parsing problem that occured)

Yury-Fridlyand · 2025-03-12T19:24:59Z

glide-core/redis-rs/redis/src/cluster_async/mod.rs

@@ -614,6 +615,7 @@ enum CmdArg<C> {
        count: usize,
        route: InternalSingleNodeRouting<C>,
        sub_pipeline: bool,
+        retry: u32,


Please enrich your documentation for retries.
As PR description says

For example, commands that fail with a MOVED error will be collected, the cluster topology will be updated accordingly, and the commands will be retried on the correct node.

How other errors are handled?

Consider posting detailed docs to the wiki page and/or to the feature docs.

Do you have a limit on retry? I think we should have.

Yes ofc I do 🙃
And I added the moved example simply bc it's the simplest of them all

Yury-Fridlyand · 2025-03-12T19:29:23Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+                    "Received a single response for a pipeline with multiple commands.".to_string(),
+                )),
+            },
+            Ok(Ok(Response::ClusterScanResult(_, _))) => ServerError::KnownError {


I see that in this line, lolwut

Yury-Fridlyand · 2025-03-12T19:31:40Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

-/// - `Err((OperationTarget, RedisError))` if a node-level or reception error occurs.
+/// - **Ok**: A `HashMap<RetryMethod, Vec<RetryEntry>>` mapping each retry method to the list of commands that failed and
+///   should be retried.
+/// - **Err**: A tuple `(OperationTarget, RedisError)` if a node-level or reception error occurs while processing responses.


but it never returns err

why not? I agree that i might not have to document this but its still returning a result

Yury-Fridlyand · 2025-03-12T19:33:52Z

glide-core/redis-rs/redis/src/types.rs

+        match self {
+            ServerError::ExtensionError { .. } => ErrorKind::ExtensionError,
+            ServerError::KnownError { kind, .. } => match kind {
+                ServerErrorKind::ResponseError => ErrorKind::ResponseError,


You could probably have a macro to do this for you.
At least you can avoid duplicating code on lines 225-236 and 266-277

yeah I agree, I will fix it

Signed-off-by: Shoham Elias <[email protected]>

barshaul · 2025-03-19T11:46:50Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+/// - **Ok**: A `HashMap<RetryMethod, Vec<RetryEntry>>` mapping each retry method to the list of commands that failed and
+///   should be retried.
+/// - **Err**: A tuple `(OperationTarget, RedisError)` if a node-level or reception error occurs while processing responses.
+#[allow(clippy::type_complexity)]
 pub fn process_pipeline_responses(


try to remove pub

barshaul · 2025-03-19T11:47:59Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

 pub fn process_pipeline_responses(
    pipeline_responses: &mut PipelineResponses,
    responses: Vec<Result<RedisResult<Response>, RecvError>>,
    addresses_and_indices: AddressAndIndices,
-) -> Result<(), (OperationTarget, RedisError)> {
+) -> Result<
+    HashMap<RetryMethod, Vec<((usize, Option<usize>), String, ServerError)>>,


barshaul · 2025-03-19T11:48:09Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+/// - **Ok**: A `HashMap<RetryMethod, Vec<RetryEntry>>` mapping each retry method to the list of commands that failed and
+///   should be retried.
+/// - **Err**: A tuple `(OperationTarget, RedisError)` if a node-level or reception error occurs while processing responses.
+#[allow(clippy::type_complexity)]


Suggested change

#[allow(clippy::type_complexity)]

barshaul · 2025-03-19T11:54:30Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

@@ -467,43 +523,552 @@ pub fn process_pipeline_responses(
                        address.clone(),
                    )?;
                }
+                continue;


barshaul · 2025-03-19T11:54:54Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

@@ -467,43 +523,552 @@ pub fn process_pipeline_responses(
                        address.clone(),
                    )?;
                }
+                continue;
+            }
+            Ok(Err(err)) => err.into(),


remove under to be above Err(err)

barshaul · 2025-03-19T13:57:02Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+                continue;
+            }
+            Ok(Err(err)) => err.into(),
+            Ok(Ok(Response::Single(_))) => ServerError::KnownError {


barshaul · 2025-03-19T13:59:22Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+            }
+            Ok(Err(err)) => err.into(),
+            Ok(Ok(Response::Single(_))) => ServerError::KnownError {
+                kind: (ServerErrorKind::ResponseError),


change to ExtensionError

barshaul · 2025-03-19T14:07:37Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+            Ok(retry_map) => {
+                // If there are no retirable errors, or we have reached the maximum number of retries, we're done
+                if retry_map.is_empty() || retry >= retry_params.number_of_retries {
+                    break Ok(());


barshaul · 2025-03-19T14:18:50Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+            RetryMethod::NoRetry => {
+                // The server error was already added to the pipeline responses, so we can just continue.
+            }
+            RetryMethod::Reconnect | RetryMethod::ReconnectAndRetry => {


when we need to reconnect without retrying, use trigger_refresh_connection_tasks

barshaul · 2025-03-19T14:42:39Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+where
+    C: Clone + ConnectionLike + Connect + Send + Sync + 'static,
+{
+    let retry_params = core


create retrry pararm with lower min and max timeouts as we dont want to stale the whole pipeline due to some failed node/shard/slot migration

barshaul · 2025-03-19T14:57:47Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+
+            futures::future::join_all(futures).await;
+        }
+        _ => {}


add unreachable with explaining message

barshaul · 2025-03-19T15:01:18Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+        })?;
+
+    // Search for the response policy based on the index.
+    let routing_info = if inner_index.is_some() {


write about the relation between inner_index and response policies

try to add all related logic under the if inner_index.is_some() section

barshaul · 2025-03-19T15:05:44Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+
+    // Search for the response policy based on the index.
+    let routing_info = if inner_index.is_some() {
+        response_policies.and_then(|vec| {


check if response_policies can be a hashmap and then save the compute

barshaul · 2025-03-19T15:09:57Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+        // Return the transformed command using the extracted indices.
+        Ok(command_for_multi_slot_indices(cmd.as_ref(), indices.1.iter()).into())
+    } else {
+        // For non-multi-slot commands, simply return a clone.


not a clone

barshaul · 2025-03-19T15:11:22Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+                    pipeline_responses,
+                    index,
+                    inner_index,
+                    Value::ServerError(server_error),


append the error

barshaul · 2025-03-19T15:18:10Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+
+        // Attempt to retrieve a connection for the given address.
+        let connection = {
+            let lock = core.conn_lock.read().expect(MUTEX_READ_ERR);


use get_connection with ByAddress

barshaul · 2025-03-19T15:21:42Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+                    pipeline_responses,
+                    index,
+                    inner_index,
+                    Value::ServerError(server_error),


barshaul · 2025-03-19T15:27:14Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+///   - The address of the node where the command was originally sent.
+///   - The `ServerError` that occurred.
+/// * `pipeline_responses` - A mutable reference to the collection of pipeline responses.
+async fn retry_commands<C>(


function suggests that commands are being retried here but they aren't

barshaul · 2025-03-19T15:32:02Z

glide-core/redis-rs/redis/src/cluster_async/pipeline_routing.rs

+        let redis_error: RedisError = error.clone().into();
+        let redirect_info = redis_error
+            .redirect()
+            .ok_or_else(|| (OperationTarget::FanOut, error.clone().into()))?;


server error

shohamazon added 2 commits March 12, 2025 10:43

Pipeline: add support for error handling

8ad5456

Signed-off-by: Shoham Elias <[email protected]>

add raise on error flag

8d58b19

Signed-off-by: Shoham Elias <[email protected]>

shohamazon self-assigned this Mar 12, 2025

shohamazon added the Core changes Used to label a PR as PR with significant changes that should trigger a full matrix tests. label Mar 12, 2025

fix test

6c4597b

Signed-off-by: Shoham Elias <[email protected]>

shohamazon marked this pull request as ready for review March 12, 2025 12:17

shohamazon requested a review from a team as a code owner March 12, 2025 12:17

shohamazon requested a review from barshaul March 12, 2025 12:29

Yury-Fridlyand reviewed Mar 12, 2025

View reviewed changes

shohamazon added 2 commits March 13, 2025 11:53

Add some fixed and macros

51c529f

Signed-off-by: Shoham Elias <[email protected]>

add transactions tests

d6b1811

Signed-off-by: Shoham Elias <[email protected]>

barshaul reviewed Mar 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline: add support for error handling #3359

Pipeline: add support for error handling #3359

shohamazon commented Mar 12, 2025 •

edited

Loading

Yury-Fridlyand Mar 12, 2025

shohamazon Mar 12, 2025

Yury-Fridlyand Mar 12, 2025

Yury-Fridlyand Mar 12, 2025

shohamazon Mar 12, 2025

Yury-Fridlyand Mar 12, 2025

shohamazon Mar 12, 2025

Yury-Fridlyand Mar 12, 2025

shohamazon Mar 12, 2025

Yury-Fridlyand Mar 12, 2025

shohamazon Mar 12, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

barshaul Mar 19, 2025

Pipeline: add support for error handling #3359

Are you sure you want to change the base?

Pipeline: add support for error handling #3359

Conversation

shohamazon commented Mar 12, 2025 • edited Loading

Why This Change Is Needed

Issue link

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shohamazon commented Mar 12, 2025 •

edited

Loading