Return toplevel ast from libclang to emacs-lisp #162

ikirill · 2015-01-13T23:51:14Z

I was trying to get irony-eldoc to use libclang's accurate type information, and for that I need to return cursors' type information to emacs-lisp. Since irony-eldoc can be run quite often, I also need to return cursors' information in bulk.

This makes a new server command that returns a list of cursors, with type information, and links to each other within the toplevel cursor surrounding current point. The idea is that emacs-lisp code can then traverse the AST that libclang produces, and that returning more information would easily result in the cursor information becoming stale.

The main entry point in emacs-lisp code is the (irony-ast-async callback), which puts libclang's AST in some overlays on top of the buffer text, and then calls the callback. I included a small debugging mode that highlights current node and siblings in different colours depending on node depth (irony-ast-debug-highlight-mode).

The implementation is fairly straightforward, just basic cursor manipulation and walking over libclang's AST printing sexp representations of useful information, and I left some debugging messages in emacs-lisp code to help reading. These should be turned off later.

I tried to keep to your C++ style, but I'm not sure how close it is.

Please let me know what you think about this. As far as I know, this is the most precise C++ AST one can have directly in emacs, and there would be other uses for it too besides eldoc support.

One note: I used a recent version of libclang while writing this, and some functions I used might be missing on older versions.

Kirill

Sarcasm · 2015-01-16T00:02:11Z

I was trying to get irony-eldoc to use libclang's accurate type information, and for that I need to return cursors' type information to emacs-lisp.

To be really honest, I think I would have preferred a simple command that returns the info at a given cursor.

Since irony-eldoc can be run quite often, I also need to return cursors' information in bulk.

Is this similar to the issue you posted that irony makes unnecessary request sometimes and we shouldn't reparse the buffer when it is not needed? Because I think this is something important to fix first and that may avoid to send a lot of data if requesting the server becomes cheaper. Sending a lot of data is currently a bit slow, a lot of it may be thrown depending on the command.

This makes a new server command that returns a list of cursors, with type information, and links to each other within the toplevel cursor surrounding current point. The idea is that emacs-lisp code can then traverse the AST that libclang produces, and that returning more information would easily result in the cursor information becoming stale.

To be honest, my strategy for now is to send less info, and if needed instead of sending all what libclang can and let the elisp figure it out. Sending a lot of data is significantly slow, so I would prefer not to send things that will be thrown away.

The main entry point in emacs-lisp code is the (irony-ast-async callback), which puts libclang's AST in some overlays on top of the buffer text, and then calls the callback. I included a small debugging mode that highlights current node and siblings in different colours depending on node depth (irony-ast-debug-highlight-mode).

You have some interesting things in your code, things like force a request to be made could be useful for other commands.

I tried to keep to your C++ style, but I'm not sure how close it is.

There are a few differences but that's not does not interfere with readability, so it's okay to understand what's going on here. FYI, irony uses ClangFormat here and there to format the code.

Please let me know what you think about this. As far as I know, this is the most precise C++ AST one can have directly in emacs, and there would be other uses for it too besides eldoc support.

I agree with that, it's probably the most precise but IMHO there is little use for this, we don't want to do semantic syntax coloring or things like that, I'm not sure what's the use-case of providing the whole AST to Emacs.

One note: I used a recent version of libclang while writing this, and some functions I used might be missing on older versions.

I try to support old version of libclang when possible, for example brief comments are used only when available.

So to resume, I think this is too much and not useful as it is. I would prefer just a simple command that returns something useful to eldoc. If there are other uses for the AST, either the command will be extended or another will see the light of day but I don't think sending the full AST is that helpful. Most of the time you want information about the context around the cursor and that's enough. If it happens that there is a good use case for the whole AST we will see at this time.

ikirill · 2015-01-16T00:44:56Z

To be really honest, I think I would have preferred a simple command that returns the info at a given cursor.

I hope I explained why I implemented things the way I did.

Is this similar to the issue you posted that irony makes unnecessary request sometimes and we shouldn't reparse the buffer when it is not needed? Because I think this is something important to fix first and that may avoid to send a lot of data if requesting the server becomes cheaper. Sending a lot of data is currently a bit slow, a lot of it may be thrown depending on the command.

That AST for a single node is not that big, I think. Look, for example at output of clang with -ast-dump, or the process output string actually passed to irony-ast. It's a few thousand characters, maybe.

To be honest, my strategy for now is to send less info, and if needed instead of sending all what libclang can and let the elisp figure it out. Sending a lot of data is significantly slow, so I would prefer not to send things that will be thrown away.

The idea of sending data in bulk for a top-level AST node is that it only becomes stale when it is modified. So for reading code, this is actually quite efficient because data is only sent once. For editing code, libclang doesn't generate AST nodes for invalid expressions, anyway.

You have some interesting things in your code, things like force a request to be made could be useful for other commands.

It's related to the question of what to do when irony's output becomes stale. For example, flycheck-irony doesn't seem to update diagnostics in files not being actively modified, so changing a function definition in one open file will not fix a diagnostic in another open file (but this is probably a trivial issue).

I agree with that, it's probably the most precise but IMHO there is little use for this, we don't want to do semantic syntax coloring or things like that, I'm not sure what's the use-case of providing the whole AST to Emacs.

Actually, why not? The c++-mode syntax coloring is not terribly accurate. And this is just how IDEs do it. I think the main downside would be a possibility of a kind of feature creep of small rarely-used slightly-broken features.

I try to support old version of libclang when possible, for example brief comments are used only when available.

That's actually kind of tricky, for example, getTypeSpelling (which I wouldn't really want to rewrite myself based on tokens or reconstructed from CXType) is not available on older versions. The type spelling is right there in clang's internals, it's exposed to the completion function, but it's not in the (old) API directly so I can't use it.

So to resume, I think this is too much and not useful as it is. I would prefer just a simple command that returns something useful to eldoc. If there are other uses for the AST, either the command will be extended or another will see the light of day but I don't think sending the full AST is that helpful. Most of the time you want information about the context around the cursor and that's enough. If it happens that there is a good use case for the whole AST we will see at this time.

Okay, but in Emacs there's basically no way at all to get an accurate C++ AST, so that is possibly the reason why there's no code that would try to use an accurate AST. Here is a long related thread on emacs-devel.

I think it would be nice to expose libclang's AST even if there is no immediate gain.

If I write a (smaller) PR that only returns type+docstring information for the current point location with no caching, would you be interested then?

Sarcasm · 2015-01-18T14:35:21Z

That AST for a single node is not that big, I think. Look, for example at output of clang with -ast-dump, or the process output string actually passed to irony-ast. It's a few thousand characters, maybe.

I haven't tested the code but what is returned exactly? The whole as for the current file or the current file + the included files? I can be quite big in either case IMHO.

Actually, why not? The c++-mode syntax coloring is not terribly accurate. And this is just how IDEs do it. I think the main downside would be a possibility of a kind of feature creep of small rarely-used slightly-broken features.

I don't think that's how IDE do it, and at least that's not how I want to do it for C++. For example see this thread on the Clang Mailing List: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-July/038345.html
There are some other solution for syntax highlighting considered than using the AST.

That's actually kind of tricky, for example, getTypeSpelling (which I wouldn't really want to rewrite myself based on tokens or reconstructed from CXType) is not available on older versions. The type spelling is right there in clang's internals, it's exposed to the completion function, but it's not in the (old) API directly so I can't use it.

Yeah, if the function is absolutely needed no problem. Sometime you may just simply make the whole irony-server command optional. So irony-mode will not give this functionality to old versions. It really depends on what is the version, if it's something not available in Clang < 3.2 for example I do not care very much to make this mandatory.

I think it would be nice to expose libclang's AST even if there is no immediate gain.

I do not, right now it's easy to modify irony-server inner working (and I'm thinking about it right now) so having a big bulk of code like that with no use-case in the immediate future will just make the maintenance more difficult for me.

If I write a (smaller) PR that only returns type+docstring information for the current point location with no caching, would you be interested then?

I would!

Sorry for the delay, I checked github but was on my phone so I wanted to make a proper reply to your post.

ikirill · 2015-01-20T03:08:15Z

I haven't tested the code but what is returned exactly? ...

The AST for the node surrounding point that is just below the translation unit level. So a function/class definition, etc.

I don't think that's how IDE do it, and at least that's not how I want to do it for C++. For example see this thread on the Clang Mailing List: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-July/038345.html

AFAICT, the readme for that project says it's for tools that have no access to a proper compiler, like for producing html output.

ikirill added 2 commits January 13, 2015 17:58

Fix constness and use inline attribute.

2e101a8

Return libclang AST in emacs-lisp-readable form.

247ecfc

ikirill mentioned this pull request Jan 20, 2015

Eldoc support #166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return toplevel ast from libclang to emacs-lisp #162

Return toplevel ast from libclang to emacs-lisp #162

ikirill commented Jan 13, 2015

Sarcasm commented Jan 16, 2015

ikirill commented Jan 16, 2015

Sarcasm commented Jan 18, 2015

ikirill commented Jan 20, 2015

Return toplevel ast from libclang to emacs-lisp #162

Are you sure you want to change the base?

Return toplevel ast from libclang to emacs-lisp #162

Conversation

ikirill commented Jan 13, 2015

Sarcasm commented Jan 16, 2015

ikirill commented Jan 16, 2015

Sarcasm commented Jan 18, 2015

ikirill commented Jan 20, 2015