Skip to content

Commit

Permalink
Created Avro Schema to BigQuerySchema Convertor (#188)
Browse files Browse the repository at this point in the history
This PR introduces a new utility class, `AvroToBigQuerySchemaTransform`,
to convert Avro schemas to BigQuery schemas. This is necessary for
creating BigQuery tables dynamically based on the Avro schema of the
incoming record in a Flink connector sink.

The following Type Conversions take place:
Reference: [Type Conversions while loading avro file in
BigQuery](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#avro_conversions)

### Data Type Transformation:

| Avro Type | BigQuery Type | Notes |

|-----------------------|----------------------|----------------------------------------------------------------------|
| `string` | `STRING` | |
| `bytes` | `BYTES` | |
| `int` | `INT64` | |
| `long` | `INT64` | |
| `float` | `FLOAT64` | |
| `double` | `FLOAT64` | |
| `boolean` | `BOOL` | |
| `enum` | `STRING` | |
| `fixed` | `BYTES` | |
| `record` | `RECORD` | |
| `array` | `REPEATED` | |
| `union(null, type)` | `NULLABLE type` | |
| `logicalType: date` | `DATE` | |
| `logicalType: time` | `TIME` | |
| `logicalType: timestamp` | `TIMESTAMP` | |
| `logicalType: local-timestamp` | `DATETIME` | |
| `logicalType: decimal` | `NUMERIC`/`BIGNUMERIC` | Depending on
precision; `STRING` if precision is out of range |
| `logicalType: geography_wkt` | `GEOGRAPHY` | |
| `logicalType: uuid` | `STRING` | |
| `logicalType: Json` | `JSON` | |


### Exception Handling:

| Exception | Cause |
|---|---|
| `IllegalArgumentException` | <ul><li>The Avro Schema is not of RECORD
type.</li><li>Unsupported Avro Field of type UNION. Only `['datatype']`,
`['null', 'datatype']` or `['datatype', 'null']` are
supported.</li><li>Precision of decimal field must be
non-negative.</li><li>Scale of decimal field must be
non-negative.</li><li>Scale of the field cannot exceed
precision.</li><li>Array cannot have a NULLABLE element.</li></ul> |
| `IllegalStateException` | BigQuery ARRAY cannot have recursive ARRAY
fields. |
| `UnsupportedOperationException` | <ul><li>BigQuery fields can only be
nested 15 times.</li><li>The Avro type of the field is not supported by
BigQuery (e.g., MAP type).</li><li>NULLABLE ARRAYS in UNION types are
not supported.</li></ul> |

/gcbrun
  • Loading branch information
shashambhavi authored Dec 13, 2024
1 parent 8d519aa commit 4900b08
Show file tree
Hide file tree
Showing 2 changed files with 1,129 additions and 0 deletions.
Loading

0 comments on commit 4900b08

Please sign in to comment.