-
Notifications
You must be signed in to change notification settings - Fork 111
Implements Token Federation for Python Driver #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
…cksTokenFederationProvider
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
…nd enhance unit tests for accurate expiry verification
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
""" | ||
try: | ||
# Add protocol if missing to ensure proper parsing | ||
if not url1.startswith(("http://", "https://")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this function of making url proper format be separated to a util. This is being used at multiple points
""" | ||
try: | ||
token = self.get_current_token() | ||
return {"Authorization": f"{token.token_type} {token.access_token}"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are not passing other external headers apart from Authorization
. In case of Managed Identities we have more headers coming from external provideer
This is called by the ExternalAuthProvider to get headers for authentication. | ||
""" | ||
# First call the underlying credentials provider to get its headers | ||
header_factory = self.credentials_provider(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this header_factory doesn't
seem to be used anywhere
else: | ||
# Token is from a different host, need to exchange | ||
logger.debug("Token from different host, exchanging token") | ||
new_token = self._exchange_token(access_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the exchange token fails, you are not storing the external credentials token. So if token exchange fails, you are repeatedly calling the external provider
header_factory = self.credentials_provider(*args, **kwargs) | ||
|
||
# Get the standard token endpoint if not already set | ||
if self.token_endpoint is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this here? Shouldn't this code be near or just before initiating token exchange. Feels very out of place in code logical flow
return self.external_headers | ||
|
||
# Return empty dict as a last resort | ||
return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by last resort ? Before this statement there is no case of error, I don't think we will ever reach here
self.token_endpoint, data=token_exchange_data, headers=self.EXCHANGE_HEADERS | ||
) | ||
|
||
if response.status_code != 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not HttpError?
@madhav-db
|
description: 'Identity federation client ID' | ||
required: true | ||
|
||
# Run on PRs that might affect token federation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this not run as part of release process? What if in a release there is no change in here listed code folders? We should still run this as part of release if not on every PR.
@@ -29,6 +34,7 @@ def __init__( | |||
tls_client_cert_file: Optional[str] = None, | |||
oauth_persistence=None, | |||
credentials_provider=None, | |||
identity_federation_client_id: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for workload identity federation flow?
Token Federation Support: | ||
----------------------- | ||
Currently, token federation is implemented as a separate auth type, but the goal is to | ||
refactor it as a feature that can work with any auth type. The current implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that in future both will be supported? separate 'use_token_federation' as well as separate auth type? Deprecating anything which is released in a client is not easy.
|
||
TODO: Future refactoring needed: | ||
1. Add a use_token_federation flag that can be combined with any auth type | ||
2. Remove TOKEN_FEDERATION as an auth_type while maintaining backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be hard once introduced
Returns: | ||
str: The formatted hostname | ||
""" | ||
if not hostname.startswith("https://"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if someone has given as http://
# Ensure expiry is timezone-aware | ||
if expiry.tzinfo is None: | ||
# Convert naive datetime to aware datetime | ||
self.expiry = expiry.replace(tzinfo=timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean that without timezone will be treated as UTC time?
|
||
def __str__(self) -> str: | ||
"""Return the token as a string in the format used for Authorization headers.""" | ||
return f"{self.token_type} {self.access_token}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
headers are typically of form "scheme token", where scheme can be like "bearer", "api key" etc. The token type more looks like our internal concepts.
I will be doing this work in a follow up PR. This change isn't related to token federation as a feature, and makes more sense to be done separately. |
Can you clarify why it is not related to token federation? |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
What type of PR is this?
Description
This PR adds token federation support to the Databricks SQL Python connector, which allows using external identity provider tokens (like GitHub Actions OIDC tokens) with Databricks SQL.
Key Changes
Core Implementation
Code Architecture
DatabricksTokenFederationProvider
class to handle token federationToken
class to manage token lifecycle and expiryAPI & Configuration
identity_federation_client_id
parameter for token federationTesting
Future Improvements
This PR enables Databricks SQL connector users to leverage external identity providers for authentication, particularly useful in CI/CD environments like GitHub Actions.
How is this tested?
Related Tickets & Documents
Notes for reviewers:
Token Federation Flow
1. Client Initialization
auth_type="token-federation"
and provides an external tokenaccess_token
or a customcredentials_provider
use_token_federation
flag2. Auth Provider Selection
get_auth_provider()
inauth.py
detects token federation auth typeDatabricksTokenFederationProvider
wrapper around the credential sourceTOKEN_FEDERATION
as an auth_type while maintaining backward compatibilityDatabricksOAuthProvider
,AccessTokenAuthProvider
, etc.)3. Token Evaluation
4. Token Exchange
5. Token Refresh
TOKEN_REFRESH_BUFFER_SECONDS = 10
):6. Fallback Handling
Future Provider Integration Plan
To properly integrate token federation with all auth providers in
authenticators.py
:Decorator Pattern Implementation:
AuthProvider
with token federation capabilitiesDatabricksOAuthProvider
,AccessTokenAuthProvider
, etc.Configuration Changes:
use_token_federation
boolean flag to connection parametersget_auth_provider()
to apply token federation wrapper when flag is setProvider Interface Enhancement:
CredentialsProvider
interface to expose necessary token informationDatabricksOAuthProvider
properly implements this interface for token accessBackward Compatibility:
auth_type="token-federation"
during transitionThe core token exchange functionality works well for any token, but the current architecture limits token federation to being a separate auth type. The primary improvement needed is architectural - enabling token federation to work with other auth types (including OAuth-based ones) while maintaining backward compatibility.