Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOP issue where context API exceptions are getting ignored #1677

Draft
wants to merge 2 commits into
base: v2.x
Choose a base branch
from

Conversation

cgillum
Copy link
Member

@cgillum cgillum commented Feb 4, 2021

Issue describing the changes in this PR

If an out-of-proc orchestration tries to call an activity that doesn't exist, the call will silently fail in the Durable extension. The result is the orchestration will hang indefinitely waiting for the activity to complete.

The issue is actually more broad than this - any exception thrown by the orchestration context object will be ignored with no tracing or other indication of a failure.

This PR fixes this issue by explicitly checking for exceptions in the out-of-proc shim and making sure the exception gets properly raised. The new result is that the orchestration in my example case will fail with a very clear error message.

[2021-02-04T18:26:35.489] Executing 'Functions.DurableFunctionsHttpStart1' (Reason='This function was programmatically called via the host APIs.', Id=26c6be14-9dea-43fa-b9e1-be1a3ef1666e)
[2021-02-04T18:26:35.770] Started orchestration with ID = '2afca7b8df5040e8926811a2e7c79f78'.
[2021-02-04T18:26:35.775] Executed 'Functions.DurableFunctionsHttpStart1' (Succeeded, Id=26c6be14-9dea-43fa-b9e1-be1a3ef1666e, Duration=294ms)
[2021-02-04T18:26:35.804] Executing 'Functions.DurableFunctionsOrchestrator1' (Reason='(null)', Id=ecd4be03-297a-42f7-9633-efe2776cb515)
[2021-02-04T18:26:35.855] 2afca7b8df5040e8926811a2e7c79f78: Function 'DurableFunctionsOrchestrator1 (Orchestrator)' failed with an error. Reason: System.ArgumentException: The function 'Hello' doesn't exist, is disabled, or is not an activity function. Additional info: The following are the known activity functions: 'DurableActivity1'.
[2021-02-04T18:26:35.856]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableTaskExtension.ThrowIfFunctionDoesNotExist(String name, FunctionType functionType) in C:\GitHub\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\DurableTaskExtension.cs:line 1060
[2021-02-04T18:26:35.857]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableOrchestrationContext.CallDurableTaskFunctionAsync[TResult](String functionName, FunctionType functionType, Boolean oneWay, String instanceId, String operation, RetryOptions retryOptions, Object input, Nullable`1 scheduledTimeUtc) in C:\GitHub\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableOrchestrationContext.cs:line 522
[2021-02-04T18:26:35.859]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.ProcessAsyncActions(AsyncAction[][] actions) in C:\GitHub\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 197
[2021-02-04T18:26:35.860]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.ScheduleDurableTaskEvents(OrchestrationInvocationResult result) in C:\GitHub\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 87
[2021-02-04T18:26:35.862]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.HandleDurableTaskReplay(OrchestrationInvocationResult executionJson) in C:\GitHub\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 51
[2021-02-04T18:26:35.863]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.TaskOrchestrationShim.InvokeUserCodeAndHandleResults(RegisteredFunctionInfo orchestratorInfo, OrchestrationContext innerContext) in C:\GitHub\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\TaskOrchestrationShim.cs:line 155. IsReplay: False. State: Failed. HubName: SampleHubVS. AppName: Sample-Hub-VS. SlotName: . ExtensionVersion: 2.4.1. SequenceNumber: 8. TaskEventId: -1
[2021-02-04T18:26:35.871] Executed 'Functions.DurableFunctionsOrchestrator1' (Failed, Id=ecd4be03-297a-42f7-9633-efe2776cb515, Duration=66ms)
[2021-02-04T18:26:35.873] System.Private.CoreLib: Exception while executing function: Functions.DurableFunctionsOrchestrator1. System.Private.CoreLib: Orchestrator function 'DurableFunctionsOrchestrator1' failed: The function 'Hello' doesn't exist, is disabled, or is not an activity function. Additional info: The following are the known activity functions: 'DurableActivity1'.

Resolves Azure/azure-functions-durable-js#197
Resolves Azure/azure-functions-durable-python#132

Pull request checklist

  • My changes do not require documentation changes
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • I have added all required tests (Unit tests, E2E tests)

@cgillum cgillum added bug out-of-proc Impacts non-.NET languages (e.g. JavaScript, Python, or PowerShell) which execute out-of-process labels Feb 4, 2021
@cgillum cgillum added this to the v2.4.2 milestone Feb 4, 2021
@cgillum cgillum self-assigned this Feb 4, 2021
Copy link
Contributor

@ConnorMcMahon ConnorMcMahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, I just want to make sure I understand the full ramifications of what would happen when an API exception is thrown.

// Check to see if the context API call failed e.g. because the caller tried to schedule a function that doesn't exist.
if (newTask.IsFaulted)
{
// Awaiting a faulted task will cause the exception to be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand the ramifications of this:

I would expect that this means that the Azure Functions execution would still have succeeded, but the orchestration will fail.

I think that is acceptable/the best we can do right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, and I would love some more clarity on this difference.

More concretely: what does it mean for an Azure Functions execution to succeed while the orchestrator fails? Is this just difference in FunctionsLogs versus DurableFunctionsLogs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the function execution would succeed but the orchestration would fail. It's not ideal but it's incrementally better than where we are today. I think we'll need to revisit aspects the out-of-proc design in order to make function executions fail. I have a couple ideas on strategies we could take but I'll save that for a different forum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I was wrong! It turns out that the function execution reports as failed even though the failure happens in the out-of-proc shim layer. I added logs in the PR description to illustrate. I suspect this is the case because the exception happens in the WebJobs middleware layer which I believe is part of the core function execution.

Copy link
Contributor

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, and for the refactor!
Before we merge this, do you mind sharing what error we see on the terminal when an non-existent activity is called in out-of-proc? Just so we can recognize it moving forward.
Also, a nit: I prefer the term OOProc from OOP as otherwise I'm not sure if we're talking about object-oriented programming or out-of-process 😉

// Check to see if the context API call failed e.g. because the caller tried to schedule a function that doesn't exist.
if (newTask.IsFaulted)
{
// Awaiting a faulted task will cause the exception to be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, and I would love some more clarity on this difference.

More concretely: what does it mean for an Azure Functions execution to succeed while the orchestrator fails? Is this just difference in FunctionsLogs versus DurableFunctionsLogs?

if (newTask != null)
{
// Check to see if the context API call failed e.g. because the caller tried to schedule a function that doesn't exist.
if (newTask.IsFaulted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When else would this be set to isFaulted be set to True? I imagine this might also be set to True when the activity exists but throws an exception, is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I need to test a normal activity failure scenario to see whether I'm causing something weird to happen in those cases. I'll validate this more before merging.

@cgillum cgillum marked this pull request as draft February 4, 2021 18:35
@cgillum
Copy link
Member Author

cgillum commented Feb 4, 2021

Hmm. So in regular error handling cases I'm now seeing a null-ref exception in the orchestration replay. Not sure if it's a bug in my fix or if I'm running into a latent bug that was previously hidden.

System.NullReferenceException
  Message=Object reference not set to an instance of an object.
  Source=Microsoft.Azure.WebJobs.Extensions.DurableTask
  StackTrace:
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.<ScheduleDurableTaskEvents>d__4.MoveNext() in C:\GitHub\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 91

I've put this PR into draft mode so that I can spend more time working through the various scenarios.

@ConnorMcMahon
Copy link
Contributor

@cgillum, I'm guessing this PR won't be making it into this release?

@ConnorMcMahon ConnorMcMahon modified the milestones: v2.4.2, v2.5.0 Mar 19, 2021
@ConnorMcMahon ConnorMcMahon modified the milestones: v2.5.0, vNext Jun 1, 2021
@davidmrdavid
Copy link
Contributor

@cgillum, any chance we could revive this PR? I'm was reminded of it after seeing this (Azure/azure-functions-durable-python#132) issue again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug out-of-proc Impacts non-.NET languages (e.g. JavaScript, Python, or PowerShell) which execute out-of-process
Projects
None yet
3 participants