-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading large JSON databases is slow #176
Comments
Hi, Yeah, I'm aware of the issue and I have some thoughts on that, I'm still not 100% sure about what's the proper solution.
I'm not surprised here, one of the function that may be quite inefficient is
I think you understand right. The parsing method is not very efficient but I think it may be difficult to get one that scales well in Elisp (I may be wrong).
That is not the server, for the compilation database everything happens on the Elisp side.
That would be a welcomed change. One thing to take care of is to be able to deduce the compile options if the entry is not directly specified in clang_CompilationDatabase_getCompileCommands(). This logic is what the undocumented function Right now I'm trying to get my head around about a somewhat clean API for the communication with irony-server, so implementing a new commands do not require to copy/pasta the same code/logic every time. So if you happen to implement an irony-server command, try to make things very simple so I can update it easily once I'm satisfied with the irony-server command. I'm not sure I'm any clear but I would be glad if you work on this, try to do something simple first and I will comment on it. |
Hi again, I skimmed through the elisp and the server code, and I have some basic ideas. As of now, the whole database is read, processed and then only the options corresponding to the current file are filtered out. Maybe the database could be stored in memory?
I think I kind of understand what happens here. I will look more closely at elisp later. Didn't find the explanation you referred to, but the function is interesting. If I understand correctly, clang_CompilationDatabase_fromDirectory can be used to read the database on the server. I will experiment with it when I got the time.
Understood, I will not touch the server API too much. I guess that I will need to remove passing of the compile options read from the database, but nothing more than that. I will also keep the user defined options in irony-additional-clang-options. /Karl |
Yes, caching the database should help but I would like to wait before doing it. The parsing should be fast anyway, not only starting from the second time.
Yes that's the one we want to use.
I don't think that's what we want. It is important that Emacs has access to these flags because they may be used for other things than irony-server's commands. I think the easiest thing to start would be to have a command in irony-server Then this function should be called in place of There will be room for further improvement but that may help already. Then we can improve on this (e.g: caching the compilation database results if the file modification time didn't change). |
I were not aware of this requirement. Then let's for now create a second executable that can be called to read database synchronously. When the server api supports it, it can be moved inside irony-server. This way I will not have to touch anything in the server API, that you are working on. /Karl |
You can implement the command in the irony-server executable but call irony-server synchronously (e.g: with
|
Is it ok to include boost as a dependency? It might be the easiest way to make routines such as expand-file-name portable on the server side. I also have some thoughts on the server API. According to the section on filter functions in the elisp manual, the filter functions can read only when emacs waits. If we want to simulate a synchronous call to irony-server, we might use the asynchronous methods currently used in combination with sit-for or sleep-for. This way, the options can be fetched from the sever and the server and need not be passed the options from elisp. We would have to write the .clang_complete stuff on the server as well, but that is a minor problem. What do you think? |
Nope, I understand the issue but I really want irony to be self-contained, apart from the obvious need to have libclang. There is an alternative solution other than to replicate the full logic of the elisp code into C++. Since it's parsing the JSON and parsing the command line that takes time, it would be possible to "compile" the compilation database into an s-expr, just doing a server implementation of
That is not the proper way to do waiting on asynchronous processes. One should use
We don't want to do this. We really want Emacs to "master" the command line options so that it can interface with:
For now I think it's better to optimize the parsing via irony-server but it shouldn't have too much power and state if possible. |
Even if we only do irony-cdb-json--load-db on the server, it still needs a portable equivalent to expand-file-name. Do you suggest we do this from scratch? If we really want it to be portable, I think this could be kind of involved. Or do you suggest using the preprocessor to call different utility functions on different systems, like realpath?
Alright, hadn't seen that one. Then synchronous communication shouldn't be that hard.
Alright, got it. |
I think we could start by just having a function that says whether or not "file" is relative, and if that's so prefix it with the directory. I think we will be pretty good with this. Maybe, if we do only irony-cdb-json--load-db, we can still do the Given: [
{ "directory": "/home/user/llvm/build",
"command": "/usr/bin/clang++ -Irelative -DSOMEDEF=\"With spaces, quotes and \\-es.\" -c -o file.o file.cc",
"file": "file.cc" },
...
] We could return: ("file.cpp" "/home/user/llvm/build" '("/usr/bin/clang++" "-Irelative" "-DSOMEDEF=\"With spaces, quotes and \\-es.\"" "-c" "-o" "file.o" "file.cc")) Hopefully more work can be done but that's a start.
Nope, it's not too difficult, irony used to be synchronous a long time ago but asynchronous was better in the end. And now I think both are useful. :) |
What do you mean with command line parsing exactly? irony-cdb-json--transform-compile-command? I think the most important thing is to single out one compile command on the server. I think for instance that the guess-flags logic have to be implemented on the server side. I have almost completely written the code in irony-cdb-json--get-compile-options, and it should work for UNIX based systems. Only thing left is the guess-flags routine. When I started writing this, I noticed a problem. libclang doesn't have a way of getting the target file name from c CXCompileCommand. I would be glad for workaround advice here. Only solution I see is to implement the parsing too (since I guess including a JSON parsing library isn't an option). |
How do you feel about switching from libclang to libtooling? I think this would solve the problem above and make the code much simpler. Another upside is that it would be safer and nicer to use the C++ API. I guess the downside is that it is less stable. |
Are you using some kind of syscalls for it to be specific to Linux or is is the path parsing methods that are Linux-specific? I would like to avoid relying on syscall which may be slow on some file system when we should be able to do something quite cheap just with path parsing.
Maybe that is a good idea but if it's not slow in Elisp and can be cached easily then it is no big deal. It's better if it's done in C++ but if the overall code is way more complex for no visible benefit that may not be worth the maintenance burden.
Damn, that's unfortunate. :-( I was really glad that I didn't have to add a JSON parsing library by seeing the compilation database support but I was wrong apparently. A self-hosted JSON parsing library is an option if libclang doesn't provide this information, something like rapidjson.
I prefer to stay with libclang whose exact purpose is to be used by IDE. I like the fact the API is stable, irony-mode support all kind of versions and I think it is a good thing. I wouldn't be against having another program that uses LibTooling but that would be something optional I think. |
Just the parsing, and this should be quite easy to fix.
It would be nice to have it on the server, since it includes a couple of passes over the whole database. But at the same time, I guess the common case is to get the flags directly without guessing. So maybe a first attempt should be something like:
Using this method, if you have your file in the database, the loading of the commands should be fast. No dependencies need to be added.
I understand this reason. It is up to you which way we take. For now, I focus on the solution above. |
I agree with what you are saying, I think it's a first step in the good direction. 👍 |
Hi, Check out the server-database branch in my fork. I've added the the exact flags logic on the server, and made a way of communicating synchronously with irony-server. I still have some stuff I want to do before I send a pull request (documented using TODO comments, and some in my head) but I wanted to here your thoughts before working more on it. Regards, |
Hi, Glad to have an early look at the implementation, there are some interesting things. I have a few comments:
What do you think about a new compilation database instead, one that would use irony-server instead of modifying the existing one? Feel free to make a pull request next time even if it's no finished, that will allow to comment the code inline. Even if I have a few remarks I'm really happy that you did this. One thing that would be interesting is to know if the delay you got when loading an LLVM file has disappeared. :) |
Thanks for the comments, I will have a closer look at them when I have time, and then I'll send a pull request. The delay has dissapeared completely, opening a .cpp file in LLVM feels instantaneous. The problem remains for header files, when the guess logic kicks in. For these files I've measured ~10s delay. /Karl |
I see your point, and leave it as it is for now.
Ok, changed it.
I also think CompilationDatabase is a better name, but for clarity I didn't
The only thing I changed was adding a boolean value. Previously, the code checked if file had been set, which I think was more unclear. If I designed the Command class, I would probably go for an abstract base class and use polymorphism instead of the switch and parsing callbacks found in CommandParser::parse. In my opinion, it would be clearer and more flexible.
I tried to keep pretty close to the LLVM style by hand at first. After reading your comment I ran clang-format on the files I had modified. It changed a lot of formatting in Irony.cpp and main.cpp, and I guess that is not wanted. If you have specific complaints on the formatting, please tell me. I kept the clang-formatting in Database.h and Database.cpp.
Sloppy of me. Fixed.
These two were already on my cleanup list. Fixed.
I think you're on to something here. If we do it this way, we can keep the old database for people with libclang < 3.2, and focus on the new compilation database for people with a more current version. Then we might even try libTooling as a way of solving the guess logic (which is quite unbearable as it is now, when opening a header file in a large project). This will be some more work, but I will look over it when I have time.
Okay, here it comes. But perhaps it shouldn't be merged before we've fixed the new database in elisp. Regards, |
I moved my changes to a new database. Check it out when you got the time. /Karl |
Great! Will do, sorry for the silence, it's still on my mind. ;) |
I caught a glimpse and it looks good! I will try to add my remarks tomorrow. For the header files I'm still not sure what to do. I know some Clang plugins for vim implement a logic as follow "get the flags from the most recently opened file". We can add some kind of configurable method in irony-mode for that. What we could do for example is to to the guess logic not on the whole JSON compilation database but on all the compile options already available in irony-mode buffers. That is the matter of another issue though. |
This is a nice simple idea, that we might try. If the user opens a header file first, I guess it won't work though. Another idea that's been growing in my head is an optional libTooling-based compilation database, mimicking the the guess logic (which I think is nice). I will have a look at your comments when I have more time, it might be a week or two. /Karl |
Finally got some time to update according to your comments. Opened a new pull request, #208. Will start looking into another way of dealing with header files next. |
Looks nice, I have a few comments which are mostly nitpicking about the coding style/convention used by irony. If you think it is time consuming to correct I can correct them myself before merging. Anyway, that's great news! Feel free to share the result of your brainstorming about the compile options guessing. |
I've been thinking about the guessing logic and implementation. I think a simple algorithm matching the file name without the extension should work fine for most cases, e.g. Irony.h -> Irony.cpp. We can also provide a command with witch the user can choose compile command from a list of all commands, if the guessing fails. When it comes to the implementation, the problem is that libclang doesn't provide a way of obtaining the file name. We could use the working directory and compile command to reconstruct all of the file names. Otherwise we could try reading the JSON file a second time, only for obtaining the file names. What do you think? |
I think there is a lot of situations where this isn't working. For example, type As a side note, having some functionality to guess (and allow the user to customize) a source file's header and vice-versa could be nice. But I guess this is another subject, it can be hard to get it right, simple. Now that I think about it, I think it is already present in Emacs ( Something I don't like much with this method is that it won't work with third-party headers and headers with no associated source files, I don't think these are edge cases sadly.
Yep it's too bad, at one point I was considering adding a JSON parser to irony-server.
I wish for something better, I think it is too fragile to rely on a matching source file, I think we would have better luck finding "any" file that include the header, or just have the header's directory in the search path. |
Okey, argument accepted. But what do you think about the command + working directory => file idea? Should we go with it or implement a second reading of the database file? If we can manage to get the file names, we will also be able to implement any logic for guessing compilation command. The guessing logic in irony-cdb-json could for instance be reimplemented. The only thing I dislike about this logic is that it doesn't work when headers and cpp files are separated in the source tree. Instead we could rank how similar the paths are from the project root, e.g. src/a/b/c/d.cpp would be a good match for include/a/b/c/d.h. |
I'm not sure, what do you mean by:
?
Yeah I think it is a downer too. I use this at work, LLVM/Clang uses this too. |
I've uploaded a very early stage of the commit I'm working on, #225. |
I opened a new request #270 for this. If you still decide you don't want anything like this I will break it out as a separate program and maintain it myself. This version still uses rapidJSON for reading the database. The upstream libclang with the ability of giving the file names as well hasn't been released yet, so I don't think it will be of any use to most people at the moment. I also looked at the CEDET project you linked. They seem to use the prefix of the path to guess the files, which doesn't give the behaviour we want. They also use json.el to read the database, probably making it as slow as the existing solution in Irony. I've made an attempt at making the database optional. If boost is not found, it isn't activated at all. Whit this, Irony shouldn't be harder to install than before. |
Hey, any progress on this issue? This is the only problem I have with irony so far - opening new file often stalls emacs for multiple seconds.
|
I recently installed irony-mode and a couple of other things. I'm using the desktop package to save and restore my open buffers. After installing irony-mode, it would take several minutes for Emacs to start when it was restoring ~20 buffers (as in > 5 minutes). As it turns out, that was actually flycheck-irony, and not irony-mode itself (and I was able to bring it down quite a bit by simplifying the compilation database). Anyway, since I thought it was this issue, I did some measuring. Out of the box, irony-0.2.0 took just over 3.3 seconds to parse and apply the compilation database (on startup, when restoring ~20 buffers). I then applied Hylen's changes from pull request #293 to the current version (0.2.0-cvs4). That brought it down to just under 0.3 seconds. So quite the speedup there. However, it had no noticeable impact on per-file loading. However, I'm also not sure about the Boost dependency. I'm on a Mac, so if I want Boost, I have to either manually compile/install it or use a third-party package management system. I did the former. CMake didn't pick it up, despite trying to coerce it via different variables. In the end, setting BOOST_ROOT and modifying the CXX flags to pick up the include path did the trick. Note that this isn't the only problem if you're on a Mac though. You need to download CMake and get a hold of the Clang sources for the include files (Xcode doesn't ship them). It just highlights that dependencies do add to the complexity of the setup. I guess it comes down to how much speedup you get, which could use some quantification. In my case for instance, I can live with the 3.3. seconds startup time. For others, if the load time per file is taken down from > 5 seconds to 0.2 seconds or something, then that's something else entirely. |
How large the speedup is depends very much on how large the database is. I do my everyday work in a project that is quite large, and for me the speedup is substantial even on per file loading. I want loading a file to feel instantaneous, as I sometimes open many files every second (when iterating over uses of a function etc.). I can see that boost isn't perfect for everyone, but I want to point out that my patch doesn't make it harder for anyone to install and use Irony-mode. It helps the people that use a system where installing boost is simple and the people that don't mind installing it even though it's a bit troublesome. Quantification is always a good thing, I'll post an update at some point in the future with some comparisons. |
I solved this issue simply via "divide and conquer" rules. splitbeardb 2will split a large compile_commands.json file in current directory, into small pieces and put to level-2 sub-directory. |
What's the current state of speeding up reading json? This just recently started biting me. |
For headers, I'm working on https://github.com/Sarcasm/compdb but this is not really ready. |
I believe my issue is the ~7500 line |
I'm interested in using the libclang backend to speed up loading of files. The problem is, I can't seem to configure it. I used I couldn't really find any instructions about how to configure the libclang backend. Help? What am I missing? I'm on fedora 25 with emacs 25.1.1, irony-20170223.515. |
Is there anything which can cache JSON compilation database? For the big library and single compile_commands.json file autocompletion is slow. And problems isn't only about autocompletion - Emacs won't allow me to print anything while it's parsing the argument. P.S. |
@Sarcasm , I know this is an old thread, but I was curious if there was any progress here. I think I'm having the same problem, but I'm not much of an emacs guru and am trying to understand my options from this thread. Just wondering if this was ever sped up in a newer version of irony or some other workaround. Thanks for this awesome project! |
@pbarragan; Caching was added in pull request #499. While not directly related (the initial loading still takes the same amount of time), it made a huge difference for me since the compilation database is no longer loaded anew for each file I open. So once loaded, it should not need to be reloaded till it changes (or you change compilation database). |
Hi,
I've found that irony-cdb-autosetup-compile-options can be really slow when reading large JSON compilation databases. I noticed this when using irony-mode on llvm itself. Every time I open a new source file there is a ~5s delay.
Based on my attempts at profiling, it seems most of the time is spent in irony-cdb-json--transform-command. Time is also spent in json-read-file (emacs built-in). I am no elisp expert, but as I understand, this only reads and postprocess the database a little. The database file is ~10000 lines, 2 MB, so I don't understand where the 5s come form. Maybe it is the communication to the server? The elisp profiler may not see this time.
Anyway, I suggest moving the database reading to the server. I might also be willing to do this, if you agree. It would be a good way to read up libclang and one of my favorite emacs extensions :)
/Karl
Profiler output:
The text was updated successfully, but these errors were encountered: