-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: shutdown when we see non retryable udf errors #2204
Conversation
Signed-off-by: Yashash H L <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2204 +/- ##
==========================================
+ Coverage 63.93% 64.03% +0.09%
==========================================
Files 334 335 +1
Lines 40676 40719 +43
==========================================
+ Hits 26006 26073 +67
+ Misses 13609 13595 -14
+ Partials 1061 1051 -10 ☔ View full report in Codecov by Sentry. |
@@ -266,7 +270,7 @@ func (df *DataForward) forwardAChunk(ctx context.Context) { | |||
if err != nil { | |||
df.opts.logger.Errorw("failed to write to sink", zap.Error(err)) | |||
df.fromBufferPartition.NoAck(ctx, readOffsets) | |||
return | |||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we want to restart the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a non retryable error, so we will have to restart.
// TODO(Retry-Sink): Check for ctx done separately? That should be covered in shutdown | ||
if ok, _ := df.IsShuttingDown(); err != nil && ok { | ||
|
||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any logic change for the operation (handlePostRetryFailures
) below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change is required for that function
log := logging.FromContext(df.ctx) | ||
stopped := make(chan struct{}) | ||
stopped := make(chan error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce another channel dedicated for error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a offline discussion, we decided to go with a blocking call implementation for start which returns an error similar to http server. We will make this change after 1.4.
log.Infow("Exited for partition...", zap.String("fromPartition", fromBufferPartitionName)) | ||
log.Info("Exited for partition...", zap.String("partition", fromBufferPartitionName)) | ||
case err := <-stopped: // critical error case | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this trigger a successful container restart? do we need to trigger a crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will trigger a successful container restart
Signed-off-by: Yashash H L <[email protected]>
@yhl25 please merge numaflow-go PR first to fix the e2es before merging this one. |
Signed-off-by: Yashash H L <[email protected]>
I reran the entire suite again and it is failing with
I am not sure why the previous run was successful, @KeranYang do you think the java SDK PR will solve this? |
Signed-off-by: Yashash H L <[email protected]>
Signed-off-by: Yashash H L <[email protected]>
Client side
Sink
Map
Source
Server side
Sink
Map
Source
numaproj/numaflow-go#165 should be merged first.