Skip to content

fix(python-driver): add null-guards in ANTLR parser and relax runtime version pin#2372

Merged
MuhammadTahaNaveed merged 7 commits intoapache:masterfrom
uesleilima:fix/python-driver-antlr-parser-robustness
Apr 20, 2026
Merged

fix(python-driver): add null-guards in ANTLR parser and relax runtime version pin#2372
MuhammadTahaNaveed merged 7 commits intoapache:masterfrom
uesleilima:fix/python-driver-antlr-parser-robustness

Conversation

@uesleilima
Copy link
Copy Markdown
Contributor

Summary

Fix two related issues in the Python driver's ANTLR4 parsing pipeline:

1. ANTLR4 agtype parser crashes on vertices with complex properties (#2367)

The ResultVisitor in builder.py crashes with AttributeError: 'NoneType' object has no attribute 'stop' when deserializing certain vertex records whose properties contain large arrays, long text fields with special characters, or deeply nested structures.

Root cause: Several visitor methods access child nodes (.symbol, .getText(), .accept()) without null-checking first. When the ANTLR4 parser tree contains None nodes, these calls crash.

Fix: Add null-guards in visitAgValue, visitFloatLiteral, visitPair, visitObj, and handleAnnotatedValue. Also replaces shadowing of builtin dict with d in handleAnnotatedValue, and uses .get() for safer key access on parsed vertex/edge dicts.

2. antlr4-python3-runtime==4.11.1 is incompatible with Python >= 3.13 (#2368)

The exact version pin ==4.11.1 causes runtime errors on Python 3.13+ due to deprecated/removed CPython internals that ANTLR 4.11.x relies on.

Fix: Relax the constraint from ==4.11.1 to >=4.11.1,<5.0 in both pyproject.toml and requirements.txt. The ANTLR ATN serialized format is unchanged between 4.11 and 4.13, so the generated lexer/parser files are compatible. Validated with antlr4-python3-runtime==4.13.2 on Python 3.11-3.14.

Tests

Added 9 new test cases in test_agtypes.py:

  • Vertices with large array properties (12 elements)
  • Vertices with special characters in properties
  • Deeply nested property structures (3 levels)
  • Empty properties
  • Null property values
  • Edges with complex properties (arrays, booleans)
  • Paths with multiple edges
  • Empty/null input handling
  • Arrays of mixed types including nested arrays

All 12 tests pass with antlr4-python3-runtime==4.13.2.

Closes

Copy link
Copy Markdown

@tcolo tcolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uesleilima thanks alot of the patch - can you please address the issues in the review from claude code?
## Code Review: `fix(python-driver)`: add null-guards in ANTLR parser and relax runtime version pin

### Summary
This PR fixes two real bugs: a crash on `AttributeError` when deserializing complex vertex properties (#2367), and a version pin that blocks Python 3.13+ (#2368). The approach is sound and the test coverage is meaningful. A few things worth discussing before merging.

---

### 🔴 Issues to Address

**1. `visitFloatLiteral` — silent fallback may mask malformed input**

When `ctx.getChild(0)` returns `None`, the node is genuinely malformed. Silently returning a default value hides the problem — a test confirms no `AttributeError`, but an incorrect result is arguably worse. Consider raising `AGTypeError` consistently with how `visitPair` handles missing nodes:

```python
# Suggested
child = ctx.getChild(0)
if child is None:
    raise AGTypeError("Malformed float literal: missing child node")

2. handleAnnotatedValue — type validation failure mid-path construction

When a type check fails mid-path (e.g., vertex dict check passes, edge dict check raises AGTypeError), the partially constructed Path object may already have been mutated. It's discarded on exception so it's minor, but a comment explaining the intended behavior would help.


🟡 Suggestions

1. Add a comment explaining the version constraint

The change from ==4.11.1 to >=4.11.1,<5.0 is the right call — ANTLR4 maintains format compatibility within a major version. A brief inline comment in pyproject.toml will help the next person who wonders why it's not pinned more tightly:

# ANTLR4 runtime is format-compatible within major versions; tested on 4.13.2 with Python 3.11–3.14
"antlr4-python3-runtime>=4.11.1,<5.0"

2. Malformed input tests should assert the expected outcome, not just the absence of AttributeError

# Current pattern proves the old crash is gone, but doesn't pin the new contract
try:
    result = handler.parse(malformed_input)
except AttributeError:
    pytest.fail("Should not raise AttributeError")

# Better: assert what *should* happen
with pytest.raises(AGTypeError):
    handler.parse(malformed_input)

# OR if graceful None is expected:
result = handler.parse(malformed_input)
assert result is None

3. visitObj silently skips None values — visitPair hard-fails on them

This inconsistency could cause silent data loss (a property disappearing) instead of a parse error. Either align the behavior or add a comment explaining the intentional difference.


✅ What Looks Good

  • The visitAgValue null-checks are clean and correctly handle the annotated-type path.
  • Replacing dict["key"] with .get("key") in handleAnnotatedValue is the right call for externally-provided data.
  • Fixing the shadowed dict builtin is a good catch — subtle but real.
  • The nine new test cases cover solid breadth of the real-world inputs that triggered the original bug.
  • Both pyproject.toml and requirements.txt are kept in sync on the version change — these often drift.

Verdict: Request Changes

The core fix is correct and needed. Two points before merging:

  1. Clarify/fix visitFloatLiteral — silent 0.0 default vs. raising AGTypeError.
  2. Tighten the malformed-input tests to assert the expected outcome, not just the absence of AttributeError.

The version pin relaxation is ready to merge as-is.


Paste this into **Files changed → Review changes → Leave a comment** on the PR and select **Request changes**.

@uesleilima uesleilima requested a review from tcolo April 5, 2026 23:03
@MuhammadTahaNaveed MuhammadTahaNaveed requested review from Copilot and removed request for tcolo April 6, 2026 10:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses crashes in the Python driver’s ANTLR4 agtype parsing by adding null-guards/error handling in the visitor and updates packaging to allow newer antlr4-python3-runtime versions (needed for Python 3.13+ compatibility).

Changes:

  • Added null-guards and additional validation/error reporting in ResultVisitor / handleAnnotatedValue during agtype deserialization.
  • Relaxed antlr4-python3-runtime dependency pin to >=4.11.1,<5.0 in packaging files.
  • Expanded agtype parsing tests to cover complex vertices/edges/paths and malformed/truncated inputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
drivers/python/age/builder.py Adds null-guards and changes annotated-type object construction logic.
drivers/python/test_agtypes.py Adds multiple new parsing tests for complex and malformed agtype inputs.
drivers/python/requirements.txt Relaxes ANTLR runtime constraint for broader Python compatibility.
drivers/python/pyproject.toml Relaxes ANTLR runtime constraint and documents supported/tested ranges.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread drivers/python/age/builder.py
Comment thread drivers/python/age/builder.py Outdated
Comment thread drivers/python/test_agtypes.py Outdated
Comment thread drivers/python/test_agtypes.py Outdated
@uesleilima
Copy link
Copy Markdown
Contributor Author

All review feedback has been addressed:

tcolo's review:

  • visitFloatLiteral: now raises AGTypeError on malformed child node instead of silent fallback.
  • handleAnnotatedValue: added comment explaining partial-construction behavior.
  • visitObj/visitPair: added comment documenting that visitPair's validation makes the None-guard defensive-only.
  • pyproject.toml: added inline comment explaining ANTLR4 version range rationale.
  • ✅ Malformed/truncated input tests now assert AGTypeError or validate recovery shape (not just absence of AttributeError).

Copilot review:

  • ✅ Vertex and edge branches now default properties to {} via d.get("properties") or {}, preventing __getitem__ crashes.
  • ✅ Replaced weak assertNotIsInstance(result, type(NotImplemented)) with structural type checks.
  • ✅ Fixed test_truncated_agtype docstring to match actual assertion behavior.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread drivers/python/test_agtypes.py Outdated
Comment thread drivers/python/pyproject.toml Outdated
@MuhammadTahaNaveed
Copy link
Copy Markdown
Member

@uesleilima needs rebase

@uesleilima uesleilima force-pushed the fix/python-driver-antlr-parser-robustness branch from d675537 to 1e5e50d Compare April 20, 2026 08:00
@uesleilima
Copy link
Copy Markdown
Contributor Author

@MuhammadTahaNaveed rebased on latest master. Conflicts in age/age.py resolved by taking upstream's merged skip_load / load_from_plugins / "$user" setup from #2366 (no functional changes from this PR's scope). Full local CI suite (test_age_py.py, test_networkx.py, test_agtypes.py, test_security.py) passes against apache/age:dev_snapshot_master.

@MuhammadTahaNaveed
Copy link
Copy Markdown
Member

@uesleilima getting this on running test suite.

ANTLR runtime and generated code versions disagree: 4.13.2!=4.11.1
ANTLR runtime and generated code versions disagree: 4.13.2!=4.11.1

… version pin

Fix two related issues in the Python driver's ANTLR4 parsing pipeline:

1. Add null-guards in ResultVisitor methods (visitAgValue, visitFloatLiteral,
   visitPair, visitObj, handleAnnotatedValue) to prevent AttributeError crashes
   when the ANTLR4 parse tree contains None child nodes. This occurs with
   vertices that have complex properties (large arrays, special characters,
   deeply nested structures). (apache#2367)

2. Relax antlr4-python3-runtime version constraint from ==4.11.1 to
   >=4.11.1,<5.0 in both pyproject.toml and requirements.txt. The 4.11.1
   pin is incompatible with Python >= 3.13. The ANTLR ATN serialized format
   is unchanged between 4.11 and 4.13, so the generated lexer/parser files
   are compatible. Validated with antlr4-python3-runtime==4.13.2 on
   Python 3.11-3.14. (apache#2368)

Also replaces shadowing of builtin 'dict' in handleAnnotatedValue with 'd',
and uses .get() for safer key access on parsed vertex/edge dicts.

Closes apache#2367
Closes apache#2368
Verify that malformed and truncated agtype strings raise AGTypeError
(or recover gracefully) rather than crashing with AttributeError.
This tests the null-guards added to the ANTLR parser visitor.

Made-with: Cursor
- visitFloatLiteral: raise AGTypeError on malformed child node instead
  of silently returning a fallback value
- visitObj: add comment documenting that visitPair's validation makes
  the None-guard defensive-only
- handleAnnotatedValue: add comment explaining partial-construction
  behavior on type-check failure
- pyproject.toml: add comment explaining ANTLR4 version range rationale
- Tests: assert AGTypeError (or graceful recovery) for malformed and
  truncated inputs, not just absence of AttributeError

Made-with: Cursor
- handleAnnotatedValue: default properties to {} when missing from
  parsed dict, preventing __getitem__ crashes on access
- Tests: replace weak assertNotIsInstance with structural type checks
- Fix truncated test docstring to match actual assertion behavior

Made-with: Cursor
- Use PostgreSQL "$user" placeholder in SET search_path.
- Exercise real escapes and Unicode in special-characters vertex test (json.dumps).
- Add Python 3.9 trove classifier to match requires-python and dependency comment.

Made-with: Cursor
… escape semantics)

- Build vertex agtype with string concat to avoid invalid f-string braces.
- Assert stored description matches parser behavior: JSON escapes remain
  literal, UTF-8 decodes normally (ensure_ascii=False on json.dumps).

Made-with: Cursor
…ntime-version-mismatch warning

The generated lexer/parser were hardcoded to check for ANTLR runtime
4.11.1, which triggered a noisy 'ANTLR runtime and generated code
versions disagree' warning when installed against a newer runtime like
4.13.2. Regenerating from Agtype.g4 with the 4.13.2 tool aligns the
generated checkVersion() call with the default-installed runtime in
the allowed dependency range and eliminates the warning.

Bumps the declared floor of antlr4-python3-runtime to 4.13.2 so the
default install path is warning-free.

Made-with: Cursor
@uesleilima uesleilima force-pushed the fix/python-driver-antlr-parser-robustness branch from 1e5e50d to 30edd80 Compare April 20, 2026 10:14
@uesleilima
Copy link
Copy Markdown
Contributor Author

@MuhammadTahaNaveed Good catch — the warning came from the generated lexer/parser hardcoding checkVersion("4.11.1") while this PR relaxed the runtime pin to >=4.11.1,<5.0, so any default install pulled 4.13.2 and triggered the mismatch.

Fixed in 1b3167a1 by:

  1. Regenerating AgtypeLexer.py, AgtypeParser.py, AgtypeListener.py, AgtypeVisitor.py from Agtype.g4 using ANTLR 4.13.2 (preserving the ASF license header). Diff is minimal: checkVersion("4.11.1")checkVersion("4.13.2") plus a couple of cosmetic ANTLR-tool style changes (if __name__ is not None and "." in __name__if "." in __name__).
  2. Bumping the dependency floor to antlr4-python3-runtime>=4.13.2,<5.0 so a fresh pip install is warning-free out of the box.

Verified locally with antlr4-python3-runtime==4.13.2 against apache/age:dev_snapshot_master — full CI-equivalent suite (test_age_py.py 25 tests, test_networkx.py 11, test_agtypes.py 14, test_security.py 46) passes with no version-disagree warning emitted.

@MuhammadTahaNaveed MuhammadTahaNaveed merged commit 26bd4f4 into apache:master Apr 20, 2026
6 checks passed
@uesleilima uesleilima deleted the fix/python-driver-antlr-parser-robustness branch April 20, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants