Skip to content

dan1elt0m/unitycatalog-pydantic

Repository files navigation

Unity Catalog Pydantic

CodeQL test Python Version from PEP 621 TOML codecov

Disclaimer: This project is unofficial and not affiliated with or endorsed by the official Unity Catalog team.

Simplifies managing Unity Catalog tables using Pydantic models.

Installation

pip install unitycatalog-pydantic

Examples

Create Table

Specify the columns as class attributes. The type hints are used to generate the schema. The docstring is used as the table comment.

from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import UCModel

class MyTable(UCModel):
    col1: str
    col2: int
    col3: float

# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)

# Create the table
table_info = await MyTable.create(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
    storage_location="s3://my_bucket/my_path",
)

Retrieve Table

table_info = await MyTable.get(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
)

Delete Table

Delete the table using the class method delete.

await MyTable.delete(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
)

Nested Models

Nested models are also supported. The nested model becomes a struct in the schema.

from pydantic import BaseModel
from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import UCModel

class NestedModel(BaseModel):
    nested_col1: str
    nested_col2: int

class MyTable(UCModel):
    col1: str
    col2: NestedModel

# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)

# Create the table
table_info = await MyTable.create(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
    storage_location="s3://my_bucket/my_path",
)

Using a BaseModel as root model

It is also possible to use a Pydantic BaseModel as the root model. This is useful when you want to use Pydantic models for other purposes as well.

from pydantic import BaseModel
from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import create_table

class NestedModel(BaseModel):
    nested_col1: str
    nested_col2: int

class MyTable(BaseModel):
    col1: str
    col2: NestedModel

# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)

# Create the table
table_info = await create_table(
    model=MyTable,
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
    storage_location="s3://my_bucket/my_path",
)

Configuration

  • tables_api: The TablesApi client.
  • catalog_name: The catalog name.
  • schema_name: The schema name.
  • storage_location: The storage location.
  • table_type: The table type (default is TableType.EXTERNAL).
  • data_source_format: The data source format (default is DataSourceFormat.DELTA).
  • comment: A comment for the table. If not provided, the table docstring is used
  • properties: The properties of the table.
  • by_alias: Whether to use the alias or name for the columns (default is True).
  • json_schema_mode: The mode in which to generate the schema (default is validation).
  • alias: The table alias. If not provided, the class name is used.

Caveats

Tested on Parquet, Delta, and CSV data source formats. Other formats may not work as expected.

  • Currently, Parquet and Unity Catalog type integration is pretty limited. For instance, there is no way to specify the integer type, because Parquet doesn't recognize integer SQL types. The same goes for other types like DATE, TIMESTAMP, etc.. This is an integration issue and not a problem with the library itself.
  • You can't use nested models for CSV data source format. This is because CSV doesn't support nested types. This is an issue with the data source format and not the library itself.
  • Latest version of DuckDB doesn't support reading some of the required fields for UC's ColumnInfo model. e.g., precision fields. This is an integration issue and not a problem with the library itself.

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

About

Manage Unity Catalog tables with Pydantic Models

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages