Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading large JSON databases is slow #176

Open
Hylen opened this issue Feb 20, 2015 · 42 comments
Open

Reading large JSON databases is slow #176

Hylen opened this issue Feb 20, 2015 · 42 comments

Comments

@Hylen
Copy link
Contributor

Hylen commented Feb 20, 2015

Hi,

I've found that irony-cdb-autosetup-compile-options can be really slow when reading large JSON compilation databases. I noticed this when using irony-mode on llvm itself. Every time I open a new source file there is a ~5s delay.

Based on my attempts at profiling, it seems most of the time is spent in irony-cdb-json--transform-command. Time is also spent in json-read-file (emacs built-in). I am no elisp expert, but as I understand, this only reads and postprocess the database a little. The database file is ~10000 lines, 2 MB, so I don't understand where the 5s come form. Maybe it is the communication to the server? The elisp profiler may not see this time.

Anyway, I suggest moving the database reading to the server. I might also be willing to do this, if you agree. It would be a good way to read up libclang and one of my favorite emacs extensions :)

/Karl

Profiler output:

- ...                                                            3387  92%
 - irony-cdb--autodetect-compile-options                         2942  80%
  - catch                                                        2942  80%
   - let                                                         2942  80%
    - while                                                      2942  80%
     - let                                                       2942  80%
      - funcall                                                  2942  80%
       - irony-cdb-json                                          2942  80%
        - cond                                                   2942  80%
         - irony-cdb-json--get-compile-options                   2942  80%
          - let                                                  2942  80%
           - if                                                  2942  80%
            - progn                                              2942  80%
             - progn                                             2942  80%
              - let                                              2942  80%
               - irony-cdb-json--load-db                         2942  80%
                - delq                                           2942  80%
                 - mapcar                                        2939  80%
                  + irony-cdb-json--transform-compile-command               1958  53%
                  + json-read-file                                981  26%
   Automatic GC                                                   445  12%
+ command-execute                                                 254   6%
+ timer-event-handler                                              10   0%
@Sarcasm
Copy link
Owner

Sarcasm commented Feb 20, 2015

Hi,

Yeah, I'm aware of the issue and I have some thoughts on that, I'm still not 100% sure about what's the proper solution.

Based on my attempts at profiling, it seems most of the time is spent in irony-cdb-json--transform-command.

I'm not surprised here, one of the function that may be quite inefficient is irony--split-command-line.

Time is also spent in json-read-file (emacs built-in). I am no elisp expert, but as I understand, this only reads and postprocess the database a little.

I think you understand right. The parsing method is not very efficient but I think it may be difficult to get one that scales well in Elisp (I may be wrong).

The database file is ~10000 lines, 2 MB, so I don't understand where the 5s come form. Maybe it is the communication to the server?

That is not the server, for the compilation database everything happens on the Elisp side.

Anyway, I suggest moving the database reading to the server. I might also be willing to do this, if you agree. It would be a good way to read up libclang and one of my favorite emacs extensions :)

That would be a welcomed change. One thing to take care of is to be able to deduce the compile options if the entry is not directly specified in clang_CompilationDatabase_getCompileCommands(). This logic is what the undocumented function irony-cdb-json--guess-flags does. Let me know if you want me to clear that up.

Right now I'm trying to get my head around about a somewhat clean API for the communication with irony-server, so implementing a new commands do not require to copy/pasta the same code/logic every time. So if you happen to implement an irony-server command, try to make things very simple so I can update it easily once I'm satisfied with the irony-server command.

I'm not sure I'm any clear but I would be glad if you work on this, try to do something simple first and I will comment on it.

@Hylen
Copy link
Contributor Author

Hylen commented Feb 23, 2015

Hi again,

I skimmed through the elisp and the server code, and I have some basic ideas. As of now, the whole database is read, processed and then only the options corresponding to the current file are filtered out. Maybe the database could be stored in memory?

That would be a welcomed change. One thing to take care of is to be able to deduce the compile options if the entry is not directly specified in clang_CompilationDatabase_getCompileCommands(). This logic is what the undocumented function irony-cdb-json--guess-flags does. Let me know if you want me to clear that up.

I think I kind of understand what happens here. I will look more closely at elisp later. Didn't find the explanation you referred to, but the function is interesting. If I understand correctly, clang_CompilationDatabase_fromDirectory can be used to read the database on the server. I will experiment with it when I got the time.

Right now I'm trying to get my head around about a somewhat clean API for the communication with irony-server, so implementing a new commands do not require to copy/pasta the same code/logic every time. So if you happen to implement an irony-server command, try to make things very simple so I can update it easily once I'm satisfied with the irony-server command.

Understood, I will not touch the server API too much. I guess that I will need to remove passing of the compile options read from the database, but nothing more than that. I will also keep the user defined options in irony-additional-clang-options.

/Karl

@Sarcasm
Copy link
Owner

Sarcasm commented Feb 24, 2015

I skimmed through the elisp and the server code, and I have some basic ideas. As of now, the whole database is read, processed and then only the options corresponding to the current file are filtered out. Maybe the database could be stored in memory?

Yes, caching the database should help but I would like to wait before doing it. The parsing should be fast anyway, not only starting from the second time.

If I understand correctly, clang_CompilationDatabase_fromDirectory can be used to read the database on the server.

Yes that's the one we want to use.

Understood, I will not touch the server API too much. I guess that I will need to remove passing of the compile options read from the database, but nothing more than that. I will also keep the user defined options in irony-additional-clang-options.

I don't think that's what we want. It is important that Emacs has access to these flags because they may be used for other things than irony-server's commands.

I think the easiest thing to start would be to have a command in irony-server compile-options FILE BUILD-DIR or something like that, that returns the compile options for the current file (using the same algorithms as the elisp one but on the server side).

Then this function should be called in place of irony-cdb-json--get-compile-options. Right now there is not much support for synchronous call in irony--send-request so that may be another problem for you. The refactoring I'm currently doing on the irony-server side should probably take into account the need for synchronous requests.

There will be room for further improvement but that may help already. Then we can improve on this (e.g: caching the compilation database results if the file modification time didn't change).

@Hylen
Copy link
Contributor Author

Hylen commented Feb 25, 2015

I don't think that's what we want. It is important that Emacs has access to these flags because they may be used for other things than irony-server's commands.

I think the easiest thing to start would be to have a command in irony-server compile-options FILE BUILD-DIR or something like that, that returns the compile options for the current file (using the same algorithms as the elisp one but on the server side).

Then this function should be called in place of irony-cdb-json--get-compile-options. Right now there is not much support for synchronous call in irony--send-request so that may be another problem for you. The refactoring I'm currently doing on the irony-server side should probably take into account the need for synchronous requests.

I were not aware of this requirement. Then let's for now create a second executable that can be called to read database synchronously. When the server api supports it, it can be moved inside irony-server. This way I will not have to touch anything in the server API, that you are working on.

/Karl

@Sarcasm
Copy link
Owner

Sarcasm commented Feb 25, 2015

You can implement the command in the irony-server executable but call irony-server synchronously (e.g: with call-process):

~/.emacs.d/irony/bin/irony-server get-compile-options <build-dir>

@Hylen
Copy link
Contributor Author

Hylen commented Feb 25, 2015

Is it ok to include boost as a dependency? It might be the easiest way to make routines such as expand-file-name portable on the server side.

I also have some thoughts on the server API. According to the section on filter functions in the elisp manual, the filter functions can read only when emacs waits. If we want to simulate a synchronous call to irony-server, we might use the asynchronous methods currently used in combination with sit-for or sleep-for. This way, the options can be fetched from the sever and the server and need not be passed the options from elisp. We would have to write the .clang_complete stuff on the server as well, but that is a minor problem. What do you think?

@Sarcasm
Copy link
Owner

Sarcasm commented Feb 25, 2015

Is it ok to include boost as a dependency?

Nope, I understand the issue but I really want irony to be self-contained, apart from the obvious need to have libclang. There is an alternative solution other than to replicate the full logic of the elisp code into C++. Since it's parsing the JSON and parsing the command line that takes time, it would be possible to "compile" the compilation database into an s-expr, just doing a server implementation of irony-cdb-json--load-db.

I also have some thoughts on the server API. According to the section on filter functions in the elisp manual, the filter functions can read only when emacs waits. If we want to simulate a synchronous call to irony-server, we might use the asynchronous methods currently used in combination with sit-for or sleep-for.

That is not the proper way to do waiting on asynchronous processes. One should use accept-process-output. I actually have a comment on irony--parse-buffer-async which reflect the current thought I have on synchronous methods.

This way, the options can be fetched from the sever and the server and need not be passed the options from elisp. We would have to write the .clang_complete stuff on the server as well, but that is a minor problem. What do you think?

We don't want to do this. We really want Emacs to "master" the command line options so that it can interface with:

  • Emacs' built-in modules, like ff-find-other-file (e.g: irony-cdb could set cc-search-directories)
  • external tools, e.g: call a compiler to generate assembly of the current file/function, like disaster
  • or to interface with some other Clang tools (clang-rename, clang-tidy, clang-modernize, ...).

For now I think it's better to optimize the parsing via irony-server but it shouldn't have too much power and state if possible.

@Hylen
Copy link
Contributor Author

Hylen commented Feb 26, 2015

Nope, I understand the issue but I really want irony to be self-contained, apart from the obvious need to have libclang. There is an alternative solution other than to replicate the full logic of the elisp code into C++. Since it's parsing the JSON and parsing the command line that takes time, it would be possible to "compile" the compilation database into an s-expr, just doing a server implementation of irony-cdb-json--load-db.

Even if we only do irony-cdb-json--load-db on the server, it still needs a portable equivalent to expand-file-name. Do you suggest we do this from scratch? If we really want it to be portable, I think this could be kind of involved. Or do you suggest using the preprocessor to call different utility functions on different systems, like realpath?

That is not the proper way to do waiting on asynchronous processes. One should use accept-process-output. I actually have a comment on irony--parse-buffer-async which reflect the current thought I have on synchronous methods.

Alright, hadn't seen that one. Then synchronous communication shouldn't be that hard.

We don't want to do this. We really want Emacs to "master" the command line options so that it can interface with:

Emacs' built-in modules, like ff-find-other-file (e.g: irony-cdb could set cc-search-directories)
external tools, e.g: call a compiler to generate assembly of the current file/function, like disaster
or to interface with some other Clang tools (clang-rename, clang-tidy, clang-modernize, ...).
For now I think it's better to optimize the parsing via irony-server but it shouldn't have too much power and state if possible.

Alright, got it.

@Sarcasm
Copy link
Owner

Sarcasm commented Feb 27, 2015

Even if we only do irony-cdb-json--load-db on the server, it still needs a portable equivalent to expand-file-name. Do you suggest we do this from scratch? If we really want it to be portable, I think this could be kind of involved. Or do you suggest using the preprocessor to call different utility functions on different systems, like realpath?

I think we could start by just having a function that says whether or not "file" is relative, and if that's so prefix it with the directory. I think we will be pretty good with this. Maybe, if we do only irony-cdb-json--load-db, we can still do the expand-file-name on the Elisp side. I think what is important to have is the command line parsing to be done in C++.

Given:

[
  { "directory": "/home/user/llvm/build",
    "command": "/usr/bin/clang++ -Irelative -DSOMEDEF=\"With spaces, quotes and \\-es.\" -c -o file.o file.cc",
    "file": "file.cc" },
  ...
]

We could return:

("file.cpp" "/home/user/llvm/build" '("/usr/bin/clang++" "-Irelative" "-DSOMEDEF=\"With spaces, quotes and \\-es.\"" "-c" "-o" "file.o" "file.cc"))

Hopefully more work can be done but that's a start.

Alright, hadn't seen that one. Then synchronous communication shouldn't be that hard.

Nope, it's not too difficult, irony used to be synchronous a long time ago but asynchronous was better in the end. And now I think both are useful. :)

@Hylen
Copy link
Contributor Author

Hylen commented Feb 27, 2015

I think we could start by just having a function that says whether or not "file" is relative, and if that's so prefix it with the directory. I think we will be pretty good with this. Maybe, if we do only irony-cdb-json--load-db, we can still do the expand-file-name on the Elisp side. I think what is important to have is the command line parsing to be done in C++.

What do you mean with command line parsing exactly? irony-cdb-json--transform-compile-command? I think the most important thing is to single out one compile command on the server. I think for instance that the guess-flags logic have to be implemented on the server side.

I have almost completely written the code in irony-cdb-json--get-compile-options, and it should work for UNIX based systems. Only thing left is the guess-flags routine. When I started writing this, I noticed a problem. libclang doesn't have a way of getting the target file name from c CXCompileCommand. I would be glad for workaround advice here. Only solution I see is to implement the parsing too (since I guess including a JSON parsing library isn't an option).

@Hylen
Copy link
Contributor Author

Hylen commented Feb 27, 2015

How do you feel about switching from libclang to libtooling? I think this would solve the problem above and make the code much simpler. Another upside is that it would be safer and nicer to use the C++ API. I guess the downside is that it is less stable.

@Sarcasm
Copy link
Owner

Sarcasm commented Feb 27, 2015

I have almost completely written the code in irony-cdb-json--get-compile-options, and it should work for UNIX based systems.

Are you using some kind of syscalls for it to be specific to Linux or is is the path parsing methods that are Linux-specific? I would like to avoid relying on syscall which may be slow on some file system when we should be able to do something quite cheap just with path parsing.

I think for instance that the guess-flags logic have to be implemented on the server side.

Maybe that is a good idea but if it's not slow in Elisp and can be cached easily then it is no big deal. It's better if it's done in C++ but if the overall code is way more complex for no visible benefit that may not be worth the maintenance burden.

When I started writing this, I noticed a problem. libclang doesn't have a way of getting the target file name from c CXCompileCommand. I would be glad for workaround advice here. Only solution I see is to implement the parsing too (since I guess including a JSON parsing library isn't an option).

Damn, that's unfortunate. :-( I was really glad that I didn't have to add a JSON parsing library by seeing the compilation database support but I was wrong apparently. A self-hosted JSON parsing library is an option if libclang doesn't provide this information, something like rapidjson.

How do you feel about switching from libclang to libtooling? I think this would solve the problem above and make the code much simpler. Another upside is that it would be safer and nicer to use the C++ API. I guess the downside is that it is less stable.

I prefer to stay with libclang whose exact purpose is to be used by IDE. I like the fact the API is stable, irony-mode support all kind of versions and I think it is a good thing. I wouldn't be against having another program that uses LibTooling but that would be something optional I think.

@Hylen
Copy link
Contributor Author

Hylen commented Feb 28, 2015

Are you using some kind of syscalls for it to be specific to Linux or is is the path parsing methods that are Linux-specific? I would like to avoid relying on syscall which may be slow on some file system when we should be able to do something quite cheap just with path parsing.

Just the parsing, and this should be quite easy to fix.

Maybe that is a good idea but if it's not slow in Elisp and can be cached easily then it is no big deal. It's better if it's done in C++ but if the overall code is way more complex for no visible benefit that may not be worth the maintenance burden.

Damn, that's unfortunate. :-( I was really glad that I didn't have to add a JSON parsing library by seeing the compilation database support but I was wrong apparently. A self-hosted JSON parsing library is an option if libclang doesn't provide this information, something like rapidjson.

It would be nice to have it on the server, since it includes a couple of passes over the whole database. But at the same time, I guess the common case is to get the flags directly without guessing. So maybe a first attempt should be something like:

  1. Implement the exact logic on the server (irony-cdb-json--load-db + irony-cdb-json--exact-flags)
  2. If this fails, load the database again in elisp and do the guessing

Using this method, if you have your file in the database, the loading of the commands should be fast. No dependencies need to be added.

I prefer to stay with libclang whose exact purpose is to be used by IDE. I like the fact the API is stable, irony-mode support all kind of versions and I think it is a good thing. I wouldn't be against having another program that uses LibTooling but that would be something optional I think.

I understand this reason. It is up to you which way we take. For now, I focus on the solution above.

@Sarcasm
Copy link
Owner

Sarcasm commented Feb 28, 2015

I agree with what you are saying, I think it's a first step in the good direction. 👍

@Hylen
Copy link
Contributor Author

Hylen commented Mar 6, 2015

Hi,

Check out the server-database branch in my fork. I've added the the exact flags logic on the server, and made a way of communicating synchronously with irony-server. I still have some stuff I want to do before I send a pull request (documented using TODO comments, and some in my head) but I wanted to here your thoughts before working more on it.

Regards,
Karl

@Sarcasm
Copy link
Owner

Sarcasm commented Mar 8, 2015

Hi,

Glad to have an early look at the implementation, there are some interesting things.

I have a few comments:

  1. I would expect irony--send-request-sync to return the results, no callback should be needed but please, feel free to let it as is, since the way sending request is implemented will change as soon as I get time to finish it up.
  2. the return value of irony-cdb-json--adjust-compile-options isn't used, it should be even if the function modify its argument by side-effects (maybe it shouldn't).
  3. file-truename on the buffer-file-name isn't necessary, one can use buffer-file-truename directly.
  4. when using accept-process-output, C-g should be allowed to not freeze Emacs if something goes wrong, see https://www.gnu.org/software/emacs/manual/html_node/elisp/Timers.html. FYI, a long time ago, irony used to be blocking and I was using this function to wait for server response:
    (defun irony-wait-request-answer (sym)
  5. I prefer the term CompilationDatabase, to Database, for the C++ side.
  6. The command stuff in C++ looks ugly now with readCompileOptions = true in complete and parse, I guess it's a little bit weirdly done. Do not worry about that, it shouldn't be hard to have something less strange in the future.
  7. the irony-server command get-compile-options talks about JSON compilation database. I wouldn't write JSON explicitly in the description as the Clang documentation doesn't limit the functions to JSON but potentially more compilation databases in the future. http://clang.llvm.org/doxygen/group__COMPILATIONDB.html#gae0822a6a54afaea92bad5d3b3256bf26
  8. Do you think you can handle graceful support of older version of Cindex? CXCompilationDatabase is supported since CINDEX_VERSION >= 6 (clang >= 3.2).
  9. Formatting of code is done with Clang-Format, if you cannot use this tool I will point you what are the issues with regard to formatting.
  10. You are calling clang_getCString() without disposing of the returned string. Right now I have a helper function called cxStringToStd that I use sometime but I'm wondering about using unique_ptr with a custom deleter instead.
    static std::string cxStringToStd(CXString cxString) {
  11. Please do not call exit in getFlags. irony-server -i should returns the errors it gets so the user can be informed inside Emacs.
  12. I think getExactFlags should return a list of compile options, to not lose information. The irony-cdb compilation databases already support multiple compile options so there is no reason to skip it.

What do you think about a new compilation database instead, one that would use irony-server instead of modifying the existing one?

Feel free to make a pull request next time even if it's no finished, that will allow to comment the code inline.

Even if I have a few remarks I'm really happy that you did this. One thing that would be interesting is to know if the delay you got when loading an LLVM file has disappeared. :)

@Hylen
Copy link
Contributor Author

Hylen commented Mar 9, 2015

Thanks for the comments, I will have a closer look at them when I have time, and then I'll send a pull request. The delay has dissapeared completely, opening a .cpp file in LLVM feels instantaneous. The problem remains for header files, when the guess logic kicks in. For these files I've measured ~10s delay.

/Karl

@Hylen Hylen mentioned this issue Mar 23, 2015
@Hylen
Copy link
Contributor Author

Hylen commented Mar 23, 2015

  1. I would expect irony--send-request-sync to return the results, no callback should be needed but please, feel free to let it as is, since the way sending request is implemented will change as soon as I get time to finish it up.

I see your point, and leave it as it is for now.

  1. the return value of irony-cdb-json--adjust-compile-options isn't used, it should be even if the function modify its argument by side-effects (maybe it shouldn't).
  2. file-truename on the buffer-file-name isn't necessary, one can use buffer-file-truename directly.

Ok, changed it.

  1. when using accept-process-output, C-g should be allowed to not freeze Emacs if something goes wrong, see https://www.gnu.org/software/emacs/manual/html_node/elisp/Timers.html. FYI, a long time ago, irony used to be blocking and I was using this function to wait for server response:
    (defun irony-wait-request-answer (sym)
  2. I prefer the term CompilationDatabase, to Database, for the C++ side.

I also think CompilationDatabase is a better name, but for clarity I didn't
want a name that could be mixed up with the libTooling class. I change to
CompilationDatabase for now.

  1. The command stuff in C++ looks ugly now with readCompileOptions = true in complete and parse, I guess it's a little bit weirdly done. Do not worry about that, it shouldn't be hard to have something less strange in the future.

The only thing I changed was adding a boolean value. Previously, the code checked if file had been set, which I think was more unclear.

If I designed the Command class, I would probably go for an abstract base class and use polymorphism instead of the switch and parsing callbacks found in CommandParser::parse. In my opinion, it would be clearer and more flexible.

  1. Formatting of code is done with Clang-Format, if you cannot use this tool I will point you what are the issues with regard to formatting.

I tried to keep pretty close to the LLVM style by hand at first. After reading your comment I ran clang-format on the files I had modified. It changed a lot of formatting in Irony.cpp and main.cpp, and I guess that is not wanted. If you have specific complaints on the formatting, please tell me. I kept the clang-formatting in Database.h and Database.cpp.

  1. You are calling clang_getCString() without disposing of the returned string. Right now I have a helper function called cxStringToStd that I use sometime but I'm wondering about using unique_ptr with a custom deleter instead.
    static std::string cxStringToStd(CXString cxString) {

Sloppy of me. Fixed.

  1. Please do not call exit in getFlags. irony-server -i should returns the errors it gets so the user can be informed inside Emacs.
  2. I think getExactFlags should return a list of compile options, to not lose information. The irony-cdb compilation databases already support multiple compile options so there is no reason to skip it.

These two were already on my cleanup list. Fixed.

  1. the irony-server command get-compile-options talks about JSON compilation database. I wouldn't write JSON explicitly in the description as the Clang documentation doesn't limit the functions to JSON but potentially more compilation databases in the future. http://clang.llvm.org/doxygen/group__COMPILATIONDB.html#gae0822a6a54afaea92bad5d3b3256bf26
  2. Do you think you can handle graceful support of older version of Cindex? CXCompilationDatabase is supported since CINDEX_VERSION >= 6 (clang >= 3.2).

What do you think about a new compilation database instead, one that would use irony-server instead of modifying the existing one?

I think you're on to something here. If we do it this way, we can keep the old database for people with libclang < 3.2, and focus on the new compilation database for people with a more current version. Then we might even try libTooling as a way of solving the guess logic (which is quite unbearable as it is now, when opening a header file in a large project). This will be some more work, but I will look over it when I have time.

Feel free to make a pull request next time even if it's no finished, that will allow to comment the code inline.

Okay, here it comes. But perhaps it shouldn't be merged before we've fixed the new database in elisp.

Regards,
Karl

@Hylen Hylen mentioned this issue Apr 3, 2015
@Hylen
Copy link
Contributor Author

Hylen commented Apr 3, 2015

I moved my changes to a new database. Check it out when you got the time.

/Karl

@Sarcasm
Copy link
Owner

Sarcasm commented Apr 3, 2015

Great! Will do, sorry for the silence, it's still on my mind. ;)

@Sarcasm
Copy link
Owner

Sarcasm commented Apr 3, 2015

I caught a glimpse and it looks good! I will try to add my remarks tomorrow.

For the header files I'm still not sure what to do. I know some Clang plugins for vim implement a logic as follow "get the flags from the most recently opened file". We can add some kind of configurable method in irony-mode for that.

What we could do for example is to to the guess logic not on the whole JSON compilation database but on all the compile options already available in irony-mode buffers. That is the matter of another issue though.

@Hylen
Copy link
Contributor Author

Hylen commented Apr 8, 2015

For the header files I'm still not sure what to do. I know some Clang plugins for vim implement a logic as follow "get the flags from the most recently opened file". We can add some kind of configurable method in irony-mode for that.

This is a nice simple idea, that we might try. If the user opens a header file first, I guess it won't work though. Another idea that's been growing in my head is an optional libTooling-based compilation database, mimicking the the guess logic (which I think is nice).

I will have a look at your comments when I have more time, it might be a week or two.

/Karl

@Hylen
Copy link
Contributor Author

Hylen commented Jun 28, 2015

Finally got some time to update according to your comments. Opened a new pull request, #208. Will start looking into another way of dealing with header files next.

@Sarcasm
Copy link
Owner

Sarcasm commented Jun 28, 2015

Looks nice, I have a few comments which are mostly nitpicking about the coding style/convention used by irony. If you think it is time consuming to correct I can correct them myself before merging.

Anyway, that's great news!

Feel free to share the result of your brainstorming about the compile options guessing.

@Hylen
Copy link
Contributor Author

Hylen commented Jul 31, 2015

I've been thinking about the guessing logic and implementation. I think a simple algorithm matching the file name without the extension should work fine for most cases, e.g. Irony.h -> Irony.cpp. We can also provide a command with witch the user can choose compile command from a list of all commands, if the guessing fails.

When it comes to the implementation, the problem is that libclang doesn't provide a way of obtaining the file name. We could use the working directory and compile command to reconstruct all of the file names. Otherwise we could try reading the JSON file a second time, only for obtaining the file names. What do you think?

@Sarcasm
Copy link
Owner

Sarcasm commented Jul 31, 2015

I've been thinking about the guessing logic and implementation. I think a simple algorithm matching the file name without the extension should work fine for most cases, e.g. Irony.h -> Irony.cpp.

I think there is a lot of situations where this isn't working. For example, type Error.h on this page: https://github.com/llvm-mirror/llvm/find/8fea35acadea2e55824682a78bef54639e5ecfce

As a side note, having some functionality to guess (and allow the user to customize) a source file's header and vice-versa could be nice. But I guess this is another subject, it can be hard to get it right, simple. Now that I think about it, I think it is already present in Emacs (ff-find-other-file), irony should probably just feed the proper compile options that will help find them. But then, it won't help us in our case.

Something I don't like much with this method is that it won't work with third-party headers and headers with no associated source files, I don't think these are edge cases sadly.

When it comes to the implementation, the problem is that libclang doesn't provide a way of obtaining the file name.

Yep it's too bad, at one point I was considering adding a JSON parser to irony-server.

What do you think?

I wish for something better, I think it is too fragile to rely on a matching source file, I think we would have better luck finding "any" file that include the header, or just have the header's directory in the search path.

@Hylen
Copy link
Contributor Author

Hylen commented Jul 31, 2015

Okey, argument accepted. But what do you think about the command + working directory => file idea? Should we go with it or implement a second reading of the database file?

If we can manage to get the file names, we will also be able to implement any logic for guessing compilation command. The guessing logic in irony-cdb-json could for instance be reimplemented. The only thing I dislike about this logic is that it doesn't work when headers and cpp files are separated in the source tree. Instead we could rank how similar the paths are from the project root, e.g. src/a/b/c/d.cpp would be a good match for include/a/b/c/d.h.

@Sarcasm
Copy link
Owner

Sarcasm commented Jul 31, 2015

Okey, argument accepted. But what do you think about the command + working directory => file idea? Should we go with it or implement a second reading of the database file?

I'm not sure, what do you mean by:

We could use the working directory and compile command to reconstruct all of the file names.

?

The only thing I dislike about this logic is that it doesn't work when headers and cpp files are separated in the source tree. Instead we could rank how similar the paths are from the project root, e.g. src/a/b/c/d.cpp would be a good match for include/a/b/c/d.h.

Yeah I think it is a downer too. I use this at work, LLVM/Clang uses this too.

@Hylen
Copy link
Contributor Author

Hylen commented Aug 10, 2015

I've uploaded a very early stage of the commit I'm working on, #225.

@Hylen
Copy link
Contributor Author

Hylen commented Dec 30, 2015

I opened a new request #270 for this. If you still decide you don't want anything like this I will break it out as a separate program and maintain it myself.

This version still uses rapidJSON for reading the database. The upstream libclang with the ability of giving the file names as well hasn't been released yet, so I don't think it will be of any use to most people at the moment.

I also looked at the CEDET project you linked. They seem to use the prefix of the path to guess the files, which doesn't give the behaviour we want. They also use json.el to read the database, probably making it as slow as the existing solution in Irony.

I've made an attempt at making the database optional. If boost is not found, it isn't activated at all. Whit this, Irony shouldn't be harder to install than before.

@volca02
Copy link

volca02 commented Feb 4, 2016

Hey,

any progress on this issue? This is the only problem I have with irony so far - opening new file often stalls emacs for multiple seconds.

- command-execute                                                2030  81%
 - call-interactively                                            2030  81%
  - ibuffer-find-file                                            1819  72%
   - find-file                                                   1819  72%
    - find-file-noselect                                         1814  72%
     - find-file-noselect-1                                      1814  72%
      - after-find-file                                          1814  72%
       - normal-mode                                             1801  72%
        - funcall                                                1801  72%
         - #<compiled 0x22c047>                                  1801  72%
          - set-auto-mode                                        1801  72%
           - set-auto-mode-0                                     1798  72%
            - c-mode                                             1782  71%
             - run-mode-hooks                                    1680  67%
              - apply                                            1677  67%
               - run-hooks                                       1677  67%
                - irony-mode                                     1677  67%
                 - run-hooks                                     1677  67%
                  - irony-cdb-autosetup-compile-options               1677  67%
                   - irony-cdb--autodetect-compile-options               1677  67%
                    - byte-code                                  1677  67%
                     - byte-code                                 1677  67%
                      - irony-cdb-json                           1659  66%
                       - irony-cdb-json--get-compile-options               1659  66%
                        - irony-cdb-json--load-db                1600  64%
                         + json-read-file                         816  32%
                         + mapcar                                 780  31%
                        + irony-cdb-json--compute-directory-cdb                 59   2%

@Hylen
Copy link
Contributor Author

Hylen commented Feb 4, 2016

I've got a patch, but were not merging it yet because it adds dependencies that @Sarcasm wants to avoid. If you don't care about that you can always checkout my branch, just take a look at #270.

@itollefsen
Copy link
Contributor

I recently installed irony-mode and a couple of other things. I'm using the desktop package to save and restore my open buffers. After installing irony-mode, it would take several minutes for Emacs to start when it was restoring ~20 buffers (as in > 5 minutes). As it turns out, that was actually flycheck-irony, and not irony-mode itself (and I was able to bring it down quite a bit by simplifying the compilation database).

Anyway, since I thought it was this issue, I did some measuring. Out of the box, irony-0.2.0 took just over 3.3 seconds to parse and apply the compilation database (on startup, when restoring ~20 buffers). I then applied Hylen's changes from pull request #293 to the current version (0.2.0-cvs4). That brought it down to just under 0.3 seconds. So quite the speedup there. However, it had no noticeable impact on per-file loading.

However, I'm also not sure about the Boost dependency.

I'm on a Mac, so if I want Boost, I have to either manually compile/install it or use a third-party package management system. I did the former. CMake didn't pick it up, despite trying to coerce it via different variables. In the end, setting BOOST_ROOT and modifying the CXX flags to pick up the include path did the trick.

Note that this isn't the only problem if you're on a Mac though. You need to download CMake and get a hold of the Clang sources for the include files (Xcode doesn't ship them). It just highlights that dependencies do add to the complexity of the setup.

I guess it comes down to how much speedup you get, which could use some quantification. In my case for instance, I can live with the 3.3. seconds startup time. For others, if the load time per file is taken down from > 5 seconds to 0.2 seconds or something, then that's something else entirely.

@Hylen
Copy link
Contributor Author

Hylen commented May 10, 2016

How large the speedup is depends very much on how large the database is. I do my everyday work in a project that is quite large, and for me the speedup is substantial even on per file loading. I want loading a file to feel instantaneous, as I sometimes open many files every second (when iterating over uses of a function etc.).

I can see that boost isn't perfect for everyone, but I want to point out that my patch doesn't make it harder for anyone to install and use Irony-mode. It helps the people that use a system where installing boost is simple and the people that don't mind installing it even though it's a bit troublesome.

Quantification is always a good thing, I'll post an update at some point in the future with some comparisons.

@jeanerpp
Copy link

I solved this issue simply via "divide and conquer" rules.
A large clang JSON database oftem contains many sub directories, so I wrote a script (splitbeardb) that split DB file into small pieces and put them into related sub-directories. For example:

splitbeardb 2

will split a large compile_commands.json file in current directory, into small pieces and put to level-2 sub-directory.

splitbeardb.txt

@cleeland
Copy link

cleeland commented Mar 1, 2017

What's the current state of speeding up reading json? This just recently started biting me.

@Sarcasm
Copy link
Owner

Sarcasm commented Mar 1, 2017

For headers, I'm working on https://github.com/Sarcasm/compdb but this is not really ready.
But if your issue is the json read in Elisp, the libclang backend should solve this.
So what is your issue?
If you do M-x irony-cdb-menu RET you can see which database is used.

@cleeland
Copy link

cleeland commented Mar 1, 2017

I believe my issue is the ~7500 line compile_commands.json that gets read every time I visit a file. It literally takes 10 seconds between find-file start and actual display. As a workaround I'm paring down compile_commands.json to ~1100 lines and that only takes 2 seconds per file visit.

@wberrier
Copy link

I'm interested in using the libclang backend to speed up loading of files.

The problem is, I can't seem to configure it.

I used customize-group RET irony to select the compilation database. I removed the json version. But, when I load up a file, and look at irony-cdb-menu, it says no compilation database is used.

I couldn't really find any instructions about how to configure the libclang backend.

Help? What am I missing? I'm on fedora 25 with emacs 25.1.1, irony-20170223.515.

@tereshkin
Copy link

Is there anything which can cache JSON compilation database? For the big library and single compile_commands.json file autocompletion is slow. And problems isn't only about autocompletion - Emacs won't allow me to print anything while it's parsing the argument.

P.S.
I'm quite a newbie to Emacs so any help would be appreciated.

@pbarragan
Copy link

@Sarcasm , I know this is an old thread, but I was curious if there was any progress here. I think I'm having the same problem, but I'm not much of an emacs guru and am trying to understand my options from this thread. Just wondering if this was ever sped up in a newer version of irony or some other workaround. Thanks for this awesome project!

@itollefsen
Copy link
Contributor

@pbarragan; Caching was added in pull request #499. While not directly related (the initial loading still takes the same amount of time), it made a huge difference for me since the compilation database is no longer loaded anew for each file I open. So once loaded, it should not need to be reloaded till it changes (or you change compilation database).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants