Investigation of Astroid's node end location attributes (end_lineno
and end_col_offset
)
#1245
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Investigation
Table of Contents
Investigation into New Astroid Attributes
Astroid provides built-in "end location" attributes. These attributes are:
end_lineno
: The 1-indexed line number where the node ends.end_col_offset
: The 0-indexed column offset where the node ends. This value is after the last symbol of the code represented by the node.Usage examples:
These attributes are part of the base NodeNG class, which is the foundation for all AST node classes in astroid.
Examples on Primitive Type Nodes
Here is the list of primitive nodes in Astroid ("nodes without children"):
1. Const
As we can see, Astroid correctly identifies the fact that the
Const
node end location is line 4 / offset 6.2. AssignName
As we can see, Astroid correctly identifies the fact that the
AssignName
node end location is line 4 / offset 1.3. Name
As we can see, Astroid correctly identifies the fact that the
Name
node end location is line 2 / offset 3.4. DelName
As we can see, Astroid correctly identifies the fact that the
DelName
node end location is line 2 / offset 55. Break
As we can see, Astroid correctly identifies the fact that the
Break
node end location is line 3 / offset 96. Continue
As we can see, Astroid correctly identifies the fact that the
Continue
node end location is line 4 / offset 167. Global
As we can see, Astroid correctly identifies the fact that the
Global
node end location is line 2 / offset 88. Import
As we can see, Astroid correctly identifies the fact that the
Import
node end location is line 2 / offset 149. ImportFrom
As we can see, Astroid correctly identifies the fact that the
Import
node end location is line 2 / offset 2610. Nonlocal
As we can see, Astroid correctly identifies the fact that the
Nonlocal
node end location is line 2 / offset 1311. Pass
As we can see, Astroid correctly identifies the fact that the
Pass
node end location is line 3 / offset 8Comparison with Current PyTA Implementation
My first intuition was to compare the default end location attributes with those produced by the transformer.
Specifically, I modified
set_endings_from_source
in the factory functionend_setter_from_source
to simply return thesame node without mutation.
Note: These changes were temporary and only intended to test the end location attributes.
This resulted in 49 out of 58 tests passing.
Reminder: each tuple in the lists correspond to
(node.fromlineno, node.end_lineno, node.col_offset, node.end_col_offset)
.Skip to observations of failed tests.
1) Fail: test_await (Await node)
Expected :[(5, 5, 4, 27)]
Actual :[(5, 5, 4, 25)]
2) Fail: test_call (Call node)
Expected :[(1, 2, 0, 9)]
Actual :[(1, 2, 0, 6)]
3) Fail: test_comprehension (Comprehension node)
Expected :[(1, 1, 7, 20), (2, 2, 7, 16), (2, 2, 21, 36), (3, 3, 9, 18), (3, 3, 23, 40)]
Actual :[(1, 1, 7, 19), (2, 2, 7, 16), (2, 2, 21, 35), (3, 3, 9, 18), (3, 3, 23, 39)]
4) Fail: test_decorators (decorators node)
Expected :[(1, 2, 0, 27), (6, 6, 0, 9)]
Actual :[(1, 2, 0, 24), (6, 6, 0, 9)]
5) Fail: test_generatorexp (GeneratorExp node)
Expected :[(1, 1, 0, 37), (2, 2, 0, 43)]
Actual :[(1, 1, 0, 35), (2, 2, 0, 39)]
6) Fail: test_list (List node)
Expected :[(1, 1, 0, 2), (2, 2, 0, 9), (3, 3, 0, 6), (4, 9, 0, 1)]
Actual :[(1, 1, 0, 2), (2, 2, 0, 8), (3, 3, 0, 5), (4, 8, 0, 5)]
7) Fail: test_raise (Raise node)
Expected :[(3, 3, 8, 24), (5, 5, 8, 36)]
Actual :[(3, 3, 8, 24), (5, 5, 8, 35)]
8) Fail: test_tuple (Tuple node)
Out of all Tuple nodes in the example file, the 18-th node's end location was wrong:
Expected: (21, 21, 0, 2)
Actual: (1, 1, 0, 6)
9) Fail: test_yieldfrom (YieldFrom node)
Expected :[(2, 2, 4, 23)]
Actual :[(2, 2, 4, 22)]
Observations
From the failed tests, we can make a key observation: both
end_lineno
andend_col_offset
are not always accurate.The
end_lineno
attribute was inaccurate only intest_list
, whileend_col_offset
was inaccurate in every failed test.It is clear that the current astroid logic for computing end locations is not fully reliable. In particular, it often fails
in cases involving whitespace. For example, in test_call:
Astroid computed
end_col_offset
as 6, when in reality it should be 9. This issue is frequent enough to matter,especially considering the fact that the current PyTA implementation passes all tests.
Your intuition was correct: astroid does not consistently evaluate node end locations accurately.
Conclusion
From this investigation, we can conclude that while astroid is usually correct for simpler nodes (such as primitive types),
it struggles to accurately determine the end locations of more complex nodes. This inaccuracy is significant because
PyTA’s current logic already works and passes all tests.
Therefore, it would be best to continue using PyTA’s custom logic rather than relying on astroid’s defaults.