Skip to content

Add pydoclint pre-commit hook and align docstrings to Google form#443

Merged
kenibrewer merged 5 commits intomainfrom
kenibrewer/pydoclint-hook
May 6, 2026
Merged

Add pydoclint pre-commit hook and align docstrings to Google form#443
kenibrewer merged 5 commits intomainfrom
kenibrewer/pydoclint-hook

Conversation

@kenibrewer
Copy link
Copy Markdown
Member

@kenibrewer kenibrewer commented Apr 29, 2026

Description

Adds a pydoclint pre-commit hook (v0.8.3, --style=google) that verifies each function's Args/Returns/Yields/Raises sections against its actual signature on every commit. The default strict mode is used, so docstring arg types must be present and match annotations, and Returns types must match the return annotation. To pass the new hook, every Args/Returns block in cytotable/convert.py, cytotable/utils.py, cytotable/sources.py, and cytotable/warehouse/iceberg.py was migrated to canonical Google form (name (TYPE):) with types matching annotations exactly, and missing Raises: sections were added where functions raise. This is a developer-tooling/quality change with no runtime behavior changes; it complements the existing ruff D-rules (presence/format) by adding signature-vs-docstring agreement.

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

Summary by CodeRabbit

  • Chores
    • Bumped the pre-commit mypy hook to a newer patch release.
    • Added a documentation linting hook to pre-commit to enforce Google-style docstrings.
    • Standardized and expanded docstrings, type annotations, and documented raised errors across the codebase.
    • Clarified utility typings and return descriptions for more accurate internal API documentation.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Warning

Rate limit exceeded

@kenibrewer has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 42 minutes and 9 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad909b40-0f66-4900-8a7b-0acbef789acd

📥 Commits

Reviewing files that changed from the base of the PR and between 4c9b3eb and c362074.

📒 Files selected for processing (5)
  • .pre-commit-config.yaml
  • cytotable/convert.py
  • cytotable/sources.py
  • cytotable/utils.py
  • cytotable/warehouse/iceberg.py
📝 Walkthrough

Walkthrough

Bumps a pre-commit mypy hook and adds a pydoclint hook; standardizes and corrects docstrings and type annotations across modules; tightens three utility function signatures; minor runtime change in file-line detection logic in sources; expanded Iceberg warehouse docstrings. No public API removals.

Changes

Pre-commit configuration

Layer / File(s) Summary
Workflow Config
.pre-commit-config.yaml
Bumps pre-commit/mirrors-mypy v1.20.1v1.20.2 and adds jsh9/pydoclint (rev: 0.8.3) configured with --style=google, placed before pylint.

Docstring, typing, and small runtime tweaks

Layer / File(s) Summary
Docstring / Type corrections
cytotable/convert.py, cytotable/warehouse/iceberg.py, cytotable/utils.py, cytotable/sources.py
Widespread standardization and correction of docstring type annotations, Args/Returns formatting, and added Raises sections (including CytoTableException, SchemaException, ImportError, ValueError, etc.).
Signature tightening / return types
cytotable/utils.py
Updated function signatures: _sqlite_mixed_type_query_to_parquet(... ) -> pa.Table, _natural_sort(list_to_sort: List[Any]) -> List[Any], and cloud_glob(..., boto_s3_client: Optional[Any] = None). Updated _duckdb_reader and other docstrings to reflect loaded plugins.
Runtime behavior tweak
cytotable/sources.py
_file_is_more_than_one_line no longer raises NoInputDataException for EOF/fewer-than-two-line reads; it returns False in those cases. Still returns True for .sqlite/.npz and other multi-line detections. Removed unused NoInputDataException import.
High-level docs for Iceberg
cytotable/warehouse/iceberg.py
Expanded write_iceberg_warehouse and _warehouse_dir docstrings to detail parameter types, pass-through conversion args, Iceberg-specific options, explicit Returns: str, and clarified Raises scenarios (including missing pyiceberg).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through docs with tidy care,
tightened types and left them fair.
A line-check softened, signatures tuned,
pre-commit helpers freshly buffed and pruned.
I nibble bugs — for now, all's square. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding a pydoclint pre-commit hook and aligning docstrings to Google style format across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 97.44% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kenibrewer/pydoclint-hook

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cytotable/sources.py`:
- Around line 323-324: The docstring claims NoInputDataException is raised for
zero-line files but the actual code path currently returns False for
non-multi-line inputs (the branch with the existing "return False" near the
check around line 333); fix this by either (A) changing that branch to raise
NoInputDataException instead of returning False (replace the "return False" in
the non-multi-line path with "raise NoInputDataException(...)"), or (B) update
the docstring to remove the claim about NoInputDataException so it matches the
current return-False behavior—make the change next to the function that contains
the existing "return False" and keep NoInputDataException referenced
consistently.

In `@cytotable/utils.py`:
- Around line 208-209: The docstring for _sqlite_mixed_type_query_to_parquet is
incorrect: it claims to return a parquet file path but the function actually
returns a pyarrow Table (pa.Table) at the end of the function; update the return
contract to state it returns a pa.Table (or modify the function to write and
return a file path if that was intended). Specifically, edit the docstring for
_sqlite_mixed_type_query_to_parquet to describe the returned type as pa.Table
and include a brief note about the table representing the extracted data so
callers (and type checkers) match the actual return value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62a2cc3d-6e95-447f-a2e7-ae8ffb1e5bff

📥 Commits

Reviewing files that changed from the base of the PR and between a957113 and d19df0f.

📒 Files selected for processing (5)
  • .pre-commit-config.yaml
  • cytotable/convert.py
  • cytotable/sources.py
  • cytotable/utils.py
  • cytotable/warehouse/iceberg.py

Comment thread cytotable/sources.py Outdated
Comment thread cytotable/utils.py Outdated
kenibrewer added a commit that referenced this pull request Apr 29, 2026
Address CodeRabbit feedback on PR #443:

- sources.py: `_file_is_more_than_one_line` documented raising
  `NoInputDataException` for zero-line files, but the `except StopIteration`
  block was unreachable (`file.readline()` returns `''` on EOF, it does not
  raise). The function actually returned `False` for zero/one-line files, which
  matches existing test expectations. Remove the dead branch and the misleading
  `Raises:` section so the contract matches behavior; drop the now-unused
  `NoInputDataException` import.
- utils.py: `_sqlite_mixed_type_query_to_parquet` was annotated `-> str` but
  returns `pa.Table.from_pylist(results)`. Both call sites in convert.py and
  the existing test pass the result as a `pa.Table` (named `arrow_data_tbl`,
  passed as `table=` to `parquet.write_table`). Correct the annotation and
  docstring to `pa.Table`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kenibrewer kenibrewer marked this pull request as draft April 29, 2026 16:31
@kenibrewer kenibrewer marked this pull request as ready for review May 1, 2026 14:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cytotable/warehouse/iceberg.py`:
- Around line 610-617: The docstring incorrectly states that missing optional
pyiceberg raises CytoTableException; update the docstring to match runtime
behavior by replacing that part to say the missing pyiceberg dependency raises
ImportError raised by _require_pyiceberg(), or alternatively change the
call-sites that use _require_pyiceberg() to catch ImportError and raise
CytoTableException instead—refer to the _require_pyiceberg() helper and the
CytoTableException symbol in your change so the documentation and runtime
behavior stay consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e80a901-4d2e-4edc-a7e3-8d66db0a8af5

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca3fe9 and 216fde2.

📒 Files selected for processing (3)
  • cytotable/sources.py
  • cytotable/utils.py
  • cytotable/warehouse/iceberg.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • cytotable/utils.py

Comment thread cytotable/warehouse/iceberg.py Outdated
Copy link
Copy Markdown
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kenibrewer !This LGTM! There's one remaining coderabbit review item that might need attention but I couldn't tell if it was absolutely accurate or not (and it's a low risk change generally).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cytotable/warehouse/iceberg.py`:
- Line 623: Remove the invalid Ruff noqa marker by deleting the "# noqa: DOC503"
suffix from the triple-quoted docstring end (the closing triple quote) so the
docstring no longer contains the pydocstyle-specific noqa; i.e., find the
closing docstring token in iceberg.py (the line containing """  # noqa: DOC503)
and remove the " # noqa: DOC503" portion so only the closing triple quotes
remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf0eaff7-1d32-4899-9780-2fecfae920f4

📥 Commits

Reviewing files that changed from the base of the PR and between 216fde2 and 4c9b3eb.

📒 Files selected for processing (1)
  • cytotable/warehouse/iceberg.py

Comment thread cytotable/warehouse/iceberg.py
kenibrewer and others added 5 commits May 5, 2026 12:06
Wire pydoclint 0.8.3 into .pre-commit-config.yaml with --style=google so
docstring sections (Args, Returns, Yields, Raises) are verified against
each function's signature on every commit. Convert all Args/Returns
blocks in convert.py, utils.py, sources.py, and warehouse/iceberg.py to
the canonical `name (TYPE):` Google form with types matching annotations
exactly, and add missing Raises sections where functions raise. Strict
mode (default) is enforced: types must be present in docstring args and
match the signature, and Returns types must match the return annotation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address CodeRabbit feedback on PR #443:

- sources.py: `_file_is_more_than_one_line` documented raising
  `NoInputDataException` for zero-line files, but the `except StopIteration`
  block was unreachable (`file.readline()` returns `''` on EOF, it does not
  raise). The function actually returned `False` for zero/one-line files, which
  matches existing test expectations. Remove the dead branch and the misleading
  `Raises:` section so the contract matches behavior; drop the now-unused
  `NoInputDataException` import.
- utils.py: `_sqlite_mixed_type_query_to_parquet` was annotated `-> str` but
  returns `pa.Table.from_pylist(results)`. Both call sites in convert.py and
  the existing test pass the result as a `pa.Table` (named `arrow_data_tbl`,
  passed as `table=` to `parquet.write_table`). Correct the annotation and
  docstring to `pa.Table`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous rewrite narrowed Raises: to only the directly raised
CytoTableException for an existing warehouse_path, but the function also
propagates ValueError from _validate_iceberg_join_prerequisites (empty
joins or missing page_keys["join"]) and additional CytoTableException
cases from _validate_image_export_prerequisites (missing image_dir,
non-existent referenced directories, missing join SQL or page_keys
when image export is requested) and from _require_pyiceberg.

Document all of these and suppress DOC503 with `# noqa: DOC503` on the
closing docstring line, since pydoclint can only see direct raise
statements and would otherwise flag the helper-propagated exceptions
as "in docstring but not in body".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Three helper functions added in PR #441 (`_glob_pattern_matches`,
`_glob_follow_symlinks`, `_walk_and_match`) carry untyped Args/Yields
sections. After rebasing this branch onto current main, pydoclint flags
DOC105/109/110/203/404 because the codebase now requires typed Google-
style docstrings. Add `name (TYPE):` and `TYPE:` blocks matching the
existing signatures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kenibrewer kenibrewer force-pushed the kenibrewer/pydoclint-hook branch from 4c9b3eb to c362074 Compare May 5, 2026 19:06
@kenibrewer kenibrewer merged commit 580cb4d into main May 6, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants