Skip to content

Handle nullable columns #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 63 commits into
base: main
Choose a base branch
from
Open

Handle nullable columns #233

wants to merge 63 commits into from

Conversation

hariso
Copy link
Contributor

@hariso hariso commented Dec 25, 2024

Description

This PR adds handling for nullable columns of a few types for which the schema couldn't be generated. With commit e961a17, a schema can be generated for the most frequently used types (except numeric and timestamp).

SQL NULLs need to be represented as nils in Go (i.e., in a record's payload). This means that we need to use pointers for such values. For example, a SQL timestamp is a *time.Time. The corresponding Avro schema needs to be a union schema where one of the types is Null, e.g.:

{
  "name": "timestamp_field",
  "type": [
    "null",
    {
      "type": "long",
      "logicalType": "local-timestamp-micros"
    }
  ]
}

This union schema can only be used with a pointer type and not with a value type. Using a union schema with Null to marshal a value type results in avro: unknown union type int.date.

Similarly, an Avro schema that has no Null cannot be used to marshal a pointer type. This results in an error like: avro: *time.Time is unsupported for Avro long.

This means that a nullable SQL type needs to result in a pointer type, and a not null SQL type needs to result in a "normal" Avro schema. For this reason, we need to fetch information about each column to determine whether it's nullable or not. This is done through the TableInfoFetcher.

The changes have been tested in a built-in and a standalone connector.

Additional changes:

  • Renamed a few test methods (which is why so many changes are reported, but the changes are easy to follow)
  • Existing test tables were changed to use NOT NULL columns: this makes verifying values easier (versus using lang.Ptr() everywhere)
  • Added an integration test that tests all the column types that we support.

Quick checks:

  • I have followed the Code Guidelines.
  • There is no other pull request for the same update/change.
  • I have written unit tests.
  • I have made sure that the PR is of reasonable size and can be easily reviewed.

@hariso hariso self-assigned this Dec 31, 2024
@raulb raulb added this to the Next milestone Feb 8, 2025
@raulb raulb moved this from In Progress to Todo in Conduit Main Mar 11, 2025
@hariso hariso moved this from Todo to In Progress in Conduit Main May 29, 2025
@hariso hariso changed the title [WIP] Handle nullable columns Handle nullable columns Jun 20, 2025
@hariso hariso marked this pull request as ready for review June 20, 2025 13:28
@hariso hariso requested a review from a team as a code owner June 20, 2025 13:28
@hariso hariso enabled auto-merge (squash) June 20, 2025 13:29
@hariso hariso moved this from In Progress to Ready for review in Conduit Main Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for review
Development

Successfully merging this pull request may close these issues.

2 participants