Skip to content

Conversation

@Fahad-Alam-Jamal
Copy link

Fix DatabricksSqlOperator XCom pickle serialization

closes: #59103

Description

This PR fixes the issue where DatabricksSqlOperator fails with _pickle.PicklingError: Can't pickle <class 'airflow.providers.databricks.hooks.databricks_sql.Row'> when XCom push is enabled (do_xcom_push=True).

Root Cause

The Databricks SQL connector returns databricks.sql.types.Row objects, which are dynamically created classes that cannot be pickled. XCom requires all return values to be picklable for storage in the Airflow metadata database. When using the default fetch_all_handler, these unpicklable Row objects were returned directly without conversion.

Solution

Introduced a new PicklableRow wrapper class in DatabricksSqlHook that:

  • Wraps unpicklable Row objects and makes them picklable via a custom __reduce__ method
  • Maintains full backward compatibility by delegating to an internal namedtuple
  • Supports all namedtuple interface operations: _fields, _asdict(), iteration, and attribute access
  • Properly handles field name renaming for invalid Python identifiers (e.g., count(1)_0)

Changes

  • Hook: Modified DatabricksSqlHook.run() to always convert Row objects to PicklableRow, even when no handler is provided
  • Hook: Updated _make_common_data_structure() to use PicklableRow instead of dynamic namedtuples
  • Tests: Added test_xcom_pickle_results_with_row_objects() to verify pickle serialization works correctly
  • Backward Compatibility: All 35 existing unit tests pass, confirming no breaking changes

Testing

  • ✅ All 35 unit tests pass, including the new pickle test
  • ✅ Verified pickle.dumps() and pickle.loads() work correctly on converted Row objects
  • ✅ Confirmed _fields attribute returns properly renamed field names
  • ✅ Verified _asdict() method returns dictionaries with original field names

@boring-cyborg
Copy link

boring-cyborg bot commented Dec 7, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to XCom pickle results from DatabricksSqlOperator in Airflow 2.11.0

2 participants