.NET: Fix issue with resuming checkpoint after package version upgrade#6636
.NET: Fix issue with resuming checkpoint after package version upgrade#6636peibekwe wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses workflow checkpoint restore failures across SDK/package upgrades by making type identity comparisons version-tolerant (simple assembly name + type full name) and by adding a lenient TypeId → Type resolution path that ignores assembly version/culture/strong-name details. This improves checkpoint portability across redeploys where only assembly versions change.
Changes:
- Update
TypeIdequality, hashing, and matching to ignore version/culture/public key token (including those embedded in generic argument type names). - Update
WorkflowSessionenvelope type resolution to bind types by simple assembly name (version-tolerant) and cache results. - Add unit tests covering
TypeIdversion tolerance, lenient type resolution, and end-to-end checkpoint resume after version rewrites.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| dotnet/src/Microsoft.Agents.AI.Workflows/Checkpointing/TypeId.cs | Implements version-tolerant TypeId matching and normalization of generic argument assembly qualifiers. |
| dotnet/src/Microsoft.Agents.AI.Workflows/WorkflowSession.cs | Switches envelope type lookup to a lenient resolver and introduces a cache for resolved types. |
| dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/CheckpointVersionToleranceTests.cs | Adds an end-to-end test validating checkpoint restore after mutating persisted Version= strings. |
| dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/TypeIdVersionToleranceTests.cs | Adds focused unit tests for TypeId equality/matching/hash semantics across version changes. |
| dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/WorkflowSessionResolveTypeLenientTests.cs | Adds tests verifying ResolveTypeLenient behavior for version-mutated and generic type ids. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 5 | Confidence: 90%
✓ Correctness
This PR correctly fixes checkpoint compatibility across package version upgrades. The TypeId equality/hashing changes are consistent, the regex for stripping version qualifiers from generic type arguments is correct, the GetSimpleAssemblyName fallback is robust, and ResolveTypeLenient correctly leverages partial-name binding. No correctness issues found.
✓ Security Reliability
This PR relaxes TypeId matching to ignore assembly version, culture, and public key token, allowing checkpoint resumption after package upgrades. The implementation is sound: the regex pattern is linear (no ReDoS risk), the lazy field initialization is thread-safe for deterministic reference-type computations, Type.GetType results are gated by the existing IExternalRequestEnvelope assignability check, and the exception filter in GetSimpleAssemblyName covers the documented throws from AssemblyName parsing. The static ConcurrentDictionary cache in ResolveTypeLenient grows unbounded but is keyed by TypeId equality (simple name + normalized type), which naturally limits cardinality to the set of distinct envelope types in the application. No high-severity security or reliability issues found.
✓ Test Coverage
The test coverage for this PR is comprehensive. Three new test classes cover the four fix areas: TypeIdVersionToleranceTests covers equality, hashing, IsMatch, NormalizeTypeName, and dictionary/set lookup across versions (including generic arguments); WorkflowSessionResolveTypeLenientTests covers the new ResolveTypeLenient method with version mutations and unknown types; CheckpointVersionToleranceTests provides a full integration test verifying end-to-end checkpoint resume after version rewriting. The tests use meaningful assertions (not just 'no exception') and verify both positive and negative cases. The only minor gap is the lack of a direct unit test for GetSimpleAssemblyName's edge cases, though it is thoroughly exercised indirectly.
✓ Failure Modes
The PR correctly implements version-tolerant type matching for checkpoint resumption. The TypeId equality/hash changes are internally consistent (SimpleAssemblyName for hashing, normalized type name comparison). The regex-based normalization handles standard .NET assembly-qualified name formats. GetSimpleAssemblyName has a robust fallback for malformed inputs. The static ConcurrentDictionary cache in ResolveTypeLenient may permanently cache null for unresolvable types, but the caller already handles null gracefully. No concrete silent-failure or data-loss paths identified.
✗ Design Approach
I found one design gap in the new version-tolerance approach: the implementation only normalizes embedded generic assembly qualifiers when they appear as the exact
Version + Culture + PublicKeyTokentriplet in that order. That leaves the change narrower than its stated contract and means some persistedTypeIdshapes will still fail equality/matching/type resolution even though the PR says version, culture, and signing metadata are all ignored.
Flagged Issues
-
TypeId.NormalizeTypeNameonly strips the exact, Version=..., Culture=..., PublicKeyToken=...sequence, so embedded generic type names that carry only one of those qualifiers or list them in a different order are still treated as different types. BecauseTypeId.Equals/GetHashCodedepend onNormalizedTypeName(TypeId.cs:69-89) andWorkflowSession.ResolveTypeLenientalso uses it (WorkflowSession.cs:333-335), the new compatibility story is incomplete for those inputs.
Automated review by peibekwe's agents
|
Flagged issue
Source: automated DevFlow PR review |
The CLR maintains strict rule and always produces the format: [Name], Version=[V], Culture=[C], PublicKeyToken=[T] |
Motivation & Context
When you upgrade the
Microsoft.Agents.AI.Workflowspackage and redeploy, any saved workflow checkpoint from the previous version becomes unusable. The framework refuses to load it with a generic "checkpoint is not compatible" error, even though nothing about the workflow actually changed; only the package's version number did.Hosts that persist checkpoints across deploys lose all in-flight work on every upgrade.
Description & Review Guide
With this fix, the framework no longer cares about version, culture, or strong-name signing info when matching types. It compares the type's name and the simple assembly's name (like
Microsoft.Agents.AI.Workflows), and ignores everything else.This fix is applied in four places that all had the same underlying bug:
List<ChatMessage>embed the version ofMicrosoft.Extensions.AI.Abstractionsinside the type name itself. We strip that version info before comparing, so an upgrade of that dependency doesn't break things either.Typeobject (for unwrapping declarative external requests) had its own copy of the same bug. Fixed and made consistent with the other path.Fixes #6466
Contribution Checklist
breaking changelabel (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.