-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Description
Title:
clang_getCursorExtent()
crashes on TRANSLATION_UNIT cursor on macOS but works on Linux
Description:
When calling clang_getCursorExtent()
on a CursorKind.TRANSLATION_UNIT
cursor, and then accessing .start
or .end
of the resulting CXSourceRange
, a segmentation fault occurs on macOS.
This happens consistently across:
- Python versions: 3.11, 3.12, 3.13, and 3.14
- Clang versions: 18.x.y and 19.x.y (built from Homebrew or official sources)
- macOS versions: (tested on macOS 14.4+ Apple Silicon and Intel)
Reproducer (Python clang.cindex
bindings):
from clang import cindex
cindex.Config.set_library_file("/opt/homebrew/opt/llvm/lib/libclang.dylib") # tried with different versions
index = cindex.Index.create()
tu = index.parse("example.cpp", args=[
"-isystem/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include",
]
cursor = tu.cursor # this is of kind TRANSLATION_UNIT
# These work:
print(cursor.kind)
print(cursor.spelling)
print(cursor.location)
# This also works (struct is returned)
print(cursor.extent.ptr_data)
print(cursor.extent.begin_int_data)
print(cursor.extent.end_int_data)
# But this crashes:
print(cursor.extent.start) # or `.end` → causes a segmentation fault on macOS
On Linux, this behaves safely — extent.start.file
may be None
, but no crash occurs.
C-level Reproducer:
CXIndex index = clang_createIndex(0, 0);
CXTranslationUnit tu = clang_parseTranslationUnit(index, "example.cpp", NULL, 0, NULL, 0, CXTranslationUnit_None);
CXCursor cursor = clang_getTranslationUnitCursor(tu);
CXSourceRange range = clang_getCursorExtent(cursor);
CXSourceLocation start = clang_getRangeStart(range); // 💥 Segfaults on macOS
Expected Behavior
clang_getCursorExtent()
should never return a CXSourceRange
that causes clang_getRangeStart()
or clang_getRangeEnd()
to crash, even for synthetic cursors like TRANSLATION_UNIT
.
If no valid extent exists, it should:
- return a dummy or sentinel range, or
- document that
.extent
is unsafe to access on certain kinds (not currently documented inlibclang
)
Notes
- This crash does not occur when calling
.extent.start
on normal entities like functions, structs, typedefs, etc. - The bug only affects the
TRANSLATION_UNIT
cursor, which is returned byclang_getTranslationUnitCursor()
. - The Python
clang.cindex
binding merely exposes the crash; the underlying issue is inclang_getRangeStart()
accessing bad memory.
Suggested Fix
Either:
- Have
clang_getCursorExtent()
return a well-defined dummyCXSourceRange
forTRANSLATION_UNIT
, or - Have
clang_getRangeStart()
gracefully reject invalid or synthetic ranges, or - Document that
TRANSLATION_UNIT
has no valid range
EDIT 2025.05.20
it looks like the issues I am having have to do with internal likely inconsitent state of llvm/clang. If I am trying to access those attributtes immediately after I obtained the translation unit, then all fine. However if I store a python reference (the python object) in a dictionary/list and come back later to it, then it becomes invalid.
Why I need this. I am trying to build a full flat AST with all possible objects (Cursors, Types, Tokens) by returning all attributes and calling all possible functions (well, functions with no args, getting lot done however). A recursive traversal would not work as it would get into cycles. The strategy was to save the objects for later processing, but though the python objects are alive, the C++ objects behind the scene get someway inconsistently messed up.
BTW: I am aware that it is possible to generate json representation of AST, but that does not contain certain information
Activity
zokrezyl commentedon May 20, 2025
Anny suggestion to avoid triggering the bug? How to check if a python wrapped cindex object is valid?
zokrezyl commentedon May 20, 2025
additional info, some similar crashes noticed when navigating from a cursor to a cindex.Type (type) and accessing attributes of a cindex.Type object.
a typical C callstack looks like
Problem happens apparently