Skip to content

fix dag version inflation caused by unmatched serialized result of task using reserialized command#61077

Open
wjddn279 wants to merge 6 commits intoapache:mainfrom
wjddn279:fix-task-group-serialize-unmatched
Open

fix dag version inflation caused by unmatched serialized result of task using reserialized command#61077
wjddn279 wants to merge 6 commits intoapache:mainfrom
wjddn279:fix-task-group-serialize-unmatched

Conversation

@wjddn279
Copy link
Contributor

@wjddn279 wjddn279 commented Jan 26, 2026

closed: #60868

Reason the Issue occurs?

To summarize the issue: when dag_parsing occurs, there is no increase in the Dag version, but when the command airflow dags reserialize is executed, an increase in the Dag version is observed.

For the following Dag, I confirmed that the hash result from parsing in the dag_processor differs from the hash result through the airflow command. Upon checking the serialized values, I found that the order of the following two fields was different:

// dag_processor
'task_group': {
    "children": {
  	"bear": ["bear", "operator"]
    }
}

// airflow command
'task_group': {
    "children": {
  	"bear": ["operator", "bear"]
    }
}

I understood that sorting should be applied to the values, so I needed to investigate the cause of this discrepancy.

Deep Dive

Initially, I checked whether there were differences in the logic between the dag_processor and the airflow command parsing logic, but they were identical. Therefore, I logged and compared the serialize_dag before sorting was applied.
the location of each logging is here ( dag_data data_ data_json)

// serialized dag from airflow dags command 
'children': {'bear': (<DagAttributeTypes.OP: 'operator'>, 'bear')} 
// state after sorting dict
'children': {'bear': (<DagAttributeTypes.OP: 'operator'>, 'bear')} 
// state after json.loads in here
'children': {'bear': ['operator', 'bear']}

// serialized dag from dag_processor
'children': {'bear': ['operator', 'bear']}
// state after sorting dict
'children': {'bear': ['bear', 'operator']}
// state after json.loads in here
'children': {'bear': ['bear', 'operator']}

The parsing results were different from the start. In fact, the Dag parsing result should match what's generated from the airflow dags command. However, I inferred that in the dag_processor, the format was changed during the dumps process of the model for inter-subprocess communication.

I confirmed that the result from the airflow dags command was also changed to match the dag_processor format after json.loads was applied (enum -> str, tuple -> list).

Ironically, the dag_processor, where the format was changed, had sorting applied correctly, while the airflow dags command did not, resulting in the discrepancy between the two.

Solution

The standard solution would be to control the values that change in the dag_processor, but this appears to be impossible with the current approach. Therefore, I modified the existing task_group serialization method to align with the dag_processor's format.

I did not modify the serialize logic to maintain logic consistency. Instead, I modified the sorting logic for generating hash values so that sorting is also applied to tuple values.

I confirmed that applying sorted to a tuple of enum and str, such as (<DagAttributeTypes.OP: 'operator'>, 'bear'), applies sorting the same way as with strings. ('bear', (<DagAttributeTypes.OP: 'operator'>, 'bear')) Therefore, the hash result values are generated identically.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@uranusjr
Copy link
Member

This is not the right fix. The tests can be fixed. The main Airflow implementation should not be changed.

@wjddn279
Copy link
Contributor Author

The problem is clear. I've reviewed several approaches, but I'm not sure if this can be resolved without modifying the existing serialize logic. I applied minimal changes, but it's essentially a modification with no functional changes.

@ephraimbuddy
Copy link
Contributor

You got this wrong, the sorted data used in creating the hash is not stored in the DB. What you have in the DB is not sorted. You can read the hashing logic here:

def hash(cls, dag_data):
"""Hash the data to get the dag_hash."""
dag_data = cls._sort_serialized_dag_dict(dag_data)
data_ = dag_data.copy()
# Remove fileloc from the hash so changes to fileloc
# does not affect the hash. In 3.0+, a combination of
# bundle_path and relative fileloc more correctly determines the
# dag file location.
data_["dag"].pop("fileloc", None)
data_json = json.dumps(data_, sort_keys=True).encode("utf-8")
return md5(data_json).hexdigest()

@wjddn279
Copy link
Contributor Author

wjddn279 commented Jan 27, 2026

@ephraimbuddy

The problem in the current situation is that the hash value is changing. The sorted order is different between "bear": ["bear", "operator"] and "bear": ["operator", "bear"] and it makes hash value different.

@wjddn279 wjddn279 force-pushed the fix-task-group-serialize-unmatched branch from 6628c9d to 16927b5 Compare January 27, 2026 09:50
@wjddn279 wjddn279 requested a review from XD-DENG as a code owner January 27, 2026 09:50
@wjddn279
Copy link
Contributor Author

@uranusjr @ephraimbuddy

Instead of changing the existing serialize logic, I modified the sorting method. While there is a difference in the serialized values between those created through dag_processor and those that aren't, I modified it so that the hash result values after sorting are the same.

Previously there was no sorting of tuples, but by adding it, I changed it so that the sort result values are the same.

@wjddn279 wjddn279 force-pushed the fix-task-group-serialize-unmatched branch from 16927b5 to 152d9cc Compare January 27, 2026 10:57
@ephraimbuddy
Copy link
Contributor

@uranusjr @ephraimbuddy

Instead of changing the existing serialize logic, I modified the sorting method. While there is a difference in the serialized values between those created through dag_processor and those that aren't, I modified it so that the hash result values after sorting are the same.

Previously there was no sorting of tuples, but by adding it, I changed it so that the sort result values are the same.

Can you add a test for this in the cli for dag reserialize ensuring that it gives the same value

@wjddn279
Copy link
Contributor Author

Can you add a test for this in the cli for dag reserialize ensuring that it gives the same value

Done! thanks!

@wjddn279 wjddn279 force-pushed the fix-task-group-serialize-unmatched branch from 357d51e to 2fdd227 Compare January 28, 2026 04:37
@wjddn279
Copy link
Contributor Author

@ephraimbuddy @uranusjr

I added test code to verify that the hash value is identical to the existing dag_processor. If you check the test code, you can see the parts that change during conversion to bytes and decoding for inter-process socket communication (enum -> str, tuple -> list), as explained in the description.

@wjddn279
Copy link
Contributor Author

wjddn279 commented Feb 6, 2026

@ephraimbuddy

Could you take another look at this?
The problem is clear and the solution seems solid. The test code also demonstrates both the issue reproduction and the fix.

@felicemcc
Copy link

Hello, we still have the issue with the DAG reserialization process (#60868), apparently this fix was not merged with Airflow 3.1.7 :(

Do you have a timeline for this to be integrated?

Best regards

@wjddn279
Copy link
Contributor Author

wjddn279 commented Mar 4, 2026

@ashb @kaxil @ephraimbuddy @amoghrajesh

Kindly ping code owners and reviewers!
There's some users want to resolve the issue by this pr

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to look at the tests you've added from a bigger screen. The new analysis looks plausible though

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks correct. The root cause is well understood: serialize_for_task_group() returns tuples (e.g. (DagAttributeTypes.OP, 'bear')) which survive as-is in the CLI reserialize path, but get converted to lists through msgpack round-trip in the dag_processor path. The sort function only handled list, so tuples fell through unsorted, causing hash mismatches and spurious DAG version bumps.

A few comments on the tests below.

@kaxil kaxil added this to the Airflow 3.1.9 milestone Mar 5, 2026
@wjddn279 wjddn279 force-pushed the fix-task-group-serialize-unmatched branch from 359e474 to 7f866dd Compare March 6, 2026 07:35
@wjddn279
Copy link
Contributor Author

wjddn279 commented Mar 6, 2026

@kaxil @ashb

I've resolved the review requested! thanks

Comment on lines +1077 to +1079
from airflow.dag_processing.processor import DagFileParsingResult, DagFileProcessorProcess
from airflow.sdk.execution_time.comms import _RequestFrame, _ResponseFrame
from airflow.serialization.serialized_objects import DagSerialization, LazyDeserializedDAG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move Imports at the top of the file please

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where executing airflow dags reserialize would unnecessarily create new DAG versions even though the DAG definition had not changed. The root cause was that task_group.children values, which are (DagAttributeTypes, task_id) tuples in the reserialize path, were not being sorted during hash computation in _sort_serialized_dag_dict. In contrast, the dag_processor's msgpack round-trip converts tuples to lists, which did get sorted — causing hash divergence for task IDs alphabetically before 'operator' (the string value of DagAttributeTypes.OP).

Changes:

  • Fix _sort_serialized_dag_dict to also process tuple values (in addition to list), ensuring (DagAttributeTypes.OP, 'bear') tuples get sorted identically to the dag_processor's ['operator', 'bear'] lists.
  • Add a regression test test_reserialize_should_make_equal_hash_with_dag_processor that simulates the dag_processor msgpack encoding/decoding and asserts hash equality.
  • Add a test DAG (test_dag_reserialize.py) with task ID 'bear' (which reproduces the bug, as 'bear' < 'operator' alphabetically).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
airflow-core/src/airflow/models/serialized_dag.py Core bug fix: extend _sort_serialized_dag_dict to process tuples (in addition to lists) for consistent hash computation
airflow-core/tests/unit/cli/commands/test_dag_command.py Regression test verifying that the hash from dag_reserialize matches the hash computed via the dag_processor msgpack round-trip flow
airflow-core/tests/unit/dags/test_dag_reserialize.py Test DAG with task ID 'bear' specifically chosen to reproduce the original bug

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No changes to DAG file, but reserialize results in multiple entries in serialized_dag table

7 participants