Skip to content

fix(auth): handle PermissionError on workload certificates to avoid startup hang and crash#17568

Open
nbayati wants to merge 4 commits into
googleapis:mainfrom
nbayati:fix/auth-mtls-permission-denied
Open

fix(auth): handle PermissionError on workload certificates to avoid startup hang and crash#17568
nbayati wants to merge 4 commits into
googleapis:mainfrom
nbayati:fix/auth-mtls-permission-denied

Conversation

@nbayati

@nbayati nbayati commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

In sandboxed environments (such as when running the gcloud CLI within a snap package manager sandbox), AppArmor rules can block reading the workload credentials directory or certificate files, raising a PermissionError.

Previously, os.path.exists() caught this PermissionError and returned False. The library interpreted this as the files not being ready yet (e.g. due to a startup race) and entered a 30-second retry loop before failing with a RefreshError.

This change:

  1. Updates _is_certificate_file_ready to use os.stat and propagate PermissionError.
  2. Catches PermissionError immediately during certificate lookup, logging a warning and falling back to unbound tokens without retrying or crashing.
  3. Adds corresponding unit tests.

Fixes b/522957573

…tartup hang and crash

In sandboxed environments (such as when running the gcloud CLI within a snap package manager sandbox), AppArmor rules can block reading the workload credentials directory or certificate files, raising a PermissionError.

Previously, `os.path.exists()` caught this PermissionError and returned `False`. The library interpreted this as the files not being ready yet (e.g. due to a startup race) and entered a 30-second retry loop before failing with a RefreshError.

This change:
1. Updates `_is_certificate_file_ready` to use `os.stat` and propagate `PermissionError`.
2. Catches `PermissionError` immediately during certificate lookup, logging a warning and falling back to unbound tokens without retrying or crashing.
3. Adds corresponding unit tests.
@nbayati nbayati requested review from a team as code owners June 24, 2026 20:46

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves error handling when checking and accessing agent identity certificates. Specifically, it updates _is_certificate_file_ready to propagate PermissionError while catching other OSError exceptions, and handles PermissionError in get_agent_identity_certificate_path by logging a warning and falling back to unbound tokens. Corresponding unit tests have been added to verify these scenarios. There are no review comments, so I have no additional feedback to provide.

@lsirac lsirac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution!

System-level / Off-diff Finding:

Graceful OSError Handling in get_and_parse_agent_identity_certificate()

get_and_parse_agent_identity_certificate() (around line 208) does not handle exceptions when reading the certificate file. Because os.path.getsize only requires directory traversal / metadata search permissions, _is_certificate_file_ready can evaluate to True even for a file that the current process does not have read permissions to open (e.g., owned by root with 600). When open(cert_path, "rb") is called, it will crash with a raw PermissionError.

Wrap the file-read in a try...except OSError block to gracefully fail-closed with a controlled RefreshError:

    try:
        with open(cert_path, "rb") as cert_file:
            cert_bytes = cert_file.read()
    except OSError as e:
        raise exceptions.RefreshError(
            f"Failed to read agent identity certificate file at {cert_path}: {e}. "
            "Token binding protection is failing. You can turn off this protection by setting "
            f"{environment_vars.GOOGLE_API_PREVENT_AGENT_TOKEN_SHARING_FOR_GCP_SERVICES} to false "
            "to fall back to unbound tokens."
        ) from e

Comment thread packages/google-auth/google/auth/_agent_identity_utils.py Outdated
Comment thread packages/google-auth/google/auth/_agent_identity_utils.py
Comment thread packages/google-auth/google/auth/_agent_identity_utils.py
Comment thread packages/google-auth/tests/test_agent_identity_utils.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to catch here too. I think it's possible for get_agent_identity_certificate_path to work correctly but then have .read to raise a PermissionError since .stat and .read require different permissions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I have wrapped the file read block in a try...except OSError block to catch read-level permissions errors and fall back to unbound tokens.

if not path:
return False
try:
return os.path.getsize(path) > 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the PR description mentions using os.stat , but the code uses os.path.getsize . Functionally equivalent since getsize uses stat under the hood, but just pointing it out!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new commit uses os.stat directly. I'll double check the PR description after making all the changes requested by reviewers to ensure it's still valid. Thanks for pointing it out :)

@nbayati nbayati requested review from lsirac and macastelaz June 25, 2026 20:27
try:
with open(cert_path, "rb") as cert_file:
cert_bytes = cert_file.read()
except OSError as e:

@lsirac lsirac Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the fallback limited to PermissionError?

has_logged_cert_warning = True

except PermissionError as e:
_LOGGER.warning(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use warnings.warn since this is intended for developers. Applies throughout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants