Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add capability to read unity catalog (uc://) uris #3113

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

omkar-foss
Copy link
Contributor

Description

This adds capability to read directly from uc:// uris using the local catalog-unity crate. This also exposes the UC temporary credentials in storage_options of the DeltaTable instance so polars or similar readers can use it.

Related Issue(s)

Documentation

N/A

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Jan 10, 2025
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@omkar-foss omkar-foss force-pushed the feat-uc-python branch 2 times, most recently from 8fcd7f4 to 0824abe Compare January 10, 2025 07:03
@omkar-foss omkar-foss changed the title feat(python): Add capability to read Unity Catalog (uc://) uris feat(python): add capability to read unity catalog (uc://) uris Jan 10, 2025
python/src/lib.rs Outdated Show resolved Hide resolved
/// Allow http url (e.g. http://localhost:8080/api/2.1/...)
/// Supported keys:
/// - `unity_allow_http_url`
AllowHttpUrl,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows users to work with a local (non-https) Unity Catalog REST API with delta-rs.

python/src/lib.rs Outdated Show resolved Hide resolved
@omkar-foss
Copy link
Contributor Author

@ion-elgreco I've updated this PR to include the temp credentials functionality from PR #3078. Cheers.

@omkar-foss omkar-foss marked this pull request as ready for review February 3, 2025 14:30
Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 16.32653% with 82 lines in your changes missing coverage. Please review.

Project coverage is 72.00%. Comparing base (b3efdfc) to head (31f7167).

Files with missing lines Patch % Lines
crates/catalog-unity/src/lib.rs 13.82% 80 Missing and 1 partial ⚠️
python/src/lib.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3113      +/-   ##
==========================================
- Coverage   72.10%   72.00%   -0.10%     
==========================================
  Files         138      138              
  Lines       45320    45414      +94     
  Branches    45320    45414      +94     
==========================================
+ Hits        32678    32701      +23     
- Misses      10567    10635      +68     
- Partials     2075     2078       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

python/src/lib.rs Outdated Show resolved Hide resolved
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Feb 11, 2025

@omkar-foss looks good from my side!

@hntd187 can you please also take a look. Specifically on whether the credentials need to be refreshed as part of the objectstore? Does UC also support writing temp credentials?

@omkar-foss
Copy link
Contributor Author

omkar-foss commented Feb 11, 2025

I'll need to add a few python tests for this flow, will add tomorrow and then if all good, I suppose we can merge it.

Also please note, the current UC REST integration that we have is more fine-tuned for Databricks Unity Catalog. The Unity Catalog OSS has slightly different response payloads for some APIs. e.g. Generate Temp Credentials API returns r2_temp_credentials and url in the JSON response for Databricks UC but not for OSS UC. So this causes json decoding errors if you try out the current UC client implementation with an OSS UC instance.

I can look into supporting OSS UC smoothly in a separate PR after this one. Cheers.

@omkar-foss omkar-foss closed this Feb 11, 2025
@omkar-foss omkar-foss reopened this Feb 11, 2025
@ion-elgreco
Copy link
Collaborator

@omkar-foss we could add an integration test with OSS version to catch these things essier

@omkar-foss
Copy link
Contributor Author

@omkar-foss we could add an integration test with OSS version to catch these things essier

Yes good idea! I'll add a failing test for Temp Creds API with the UC OSS JSON payload along with some other python tests.

This adds capability to read directly from uc:// uris using the
local catalog-unity crate. This also exposes the UC temporary
credentials in storage_options of the `DeltaTable` instance so
polars or similar readers can use it.

Signed-off-by: Omkar P <[email protected]>
@omkar-foss
Copy link
Contributor Author

@omkar-foss we could add an integration test with OSS version to catch these things essier

Yes good idea! I'll add a failing test for Temp Creds API with the UC OSS JSON payload along with some other python tests.

Hi, I've added python integration tests for both UC Databricks and UC OSS with their respective mock payloads. Using mockoon as a mock HTTP server in GitHub Actions for the integration tests.

@ion-elgreco
Copy link
Collaborator

@omkar-foss we could add an integration test with OSS version to catch these things essier

Yes good idea! I'll add a failing test for Temp Creds API with the UC OSS JSON payload along with some other python tests.

Hi, I've added python integration tests for both UC Databricks and UC OSS with their respective mock payloads. Using mockoon as a mock HTTP server in GitHub Actions for the integration tests.

Why not with the actual UC server?

@omkar-foss
Copy link
Contributor Author

Why not with the actual UC server?

Since I had to anyway add a mock server for the UC Databricks integration test, thought might as well use it for UC OSS test to keep it simple, since we're testing only 2 UC REST APIs (Temp Credentials and Get Table Details).

Let me know if you foresee a strong use case for adding the actual UC OSS server, if so, I'll add it here :)

@ion-elgreco
Copy link
Collaborator

Let's give @hntd187 some time to give a final look, otherwise we can merge end of the week

Copy link
Collaborator

@hntd187 hntd187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, I'd like this to not panic, instead returning errors

temp_creds.get_credentials().unwrap()
}
TableTempCredentialsResponse::Error(_error) => {
panic!("Unable to get temporary credentials from Unity Catalog.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think being unable to get temp creds should result in a panic, perhaps a nice error result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

) -> Result<(String, HashMap<String, String>), UnityCatalogError> {
let uri_parts: Vec<&str> = table_uri[5..].split('.').collect();
if uri_parts.len() != 3 {
panic!("Invalid Unity Catalog URI: {}", table_uri);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return an error instead of panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let database_name = uri_parts[1];
let table_name = uri_parts[2];

let unity_catalog = match UnityCatalogBuilder::from_env().build() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use ? operator instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done

.get_table_storage_location(Some(catalog_id.to_string()), database_name, table_name)
.await
{
Ok(s) => s,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

));

let (table_path, temp_creds) = match result {
Ok(tup) => tup,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


let mut storage_options = options.0.clone();

if !temp_creds.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if it's empty? extend should work on an empty collection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, silly me, thanks for this. Done.

@omkar-foss
Copy link
Contributor Author

Mostly good, I'd like this to not panic, instead returning errors

Thanks for your review, Steve. I've updated the PR as per your comments.

@omkar-foss omkar-foss requested a review from hntd187 February 14, 2025 06:08
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Feb 14, 2025

@omkar-foss can you add an integration page to the docs? :)

.await?;
let credentials = match temp_creds_res {
TableTempCredentialsResponse::Success(temp_creds) => {
temp_creds.get_credentials().unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed an error case here

Copy link
Collaborator

@hntd187 hntd187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one more error case nit :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Python-side support for Unity Catalog
4 participants