Skip to content
This repository was archived by the owner on Dec 29, 2022. It is now read-only.

Use cached build plan mostly instead of Cargo to schedule a multi-target build #462

Merged
merged 7 commits into from
Sep 20, 2017

Conversation

Xanewok
Copy link
Member

@Xanewok Xanewok commented Aug 24, 2017

WIP: This needs only plugging the logic in the build/mod.rs, which is relatively a small amount of work, but wanted to get a feedback if things are going in a good direction.
@nrc could you take a look and tell me if the overall design is good and if there are any obvious mistakes? I tried to debug and test intermediate functions and behaviour using the RLS and different crates and it works as intended, at least from what I tested on single-package projects.

This probably still will need some tweaking on the vfs side, as vfs.get_changes().iter().map(|(k,_)| k.clone()).collect() seems to retrieve not only dirty files, also disk-synced files, which are only open (which might fool the dirty crate heuristics).

@Xanewok
Copy link
Member Author

Xanewok commented Aug 25, 2017

Success

It works!

At least partially 😄
The diagnostics work, but the analysis data currently looks to be working only for a single (primary?) package. Unfortunately I believe that was also a case previously, when running only cargo() under workspace_mode.

Now under workspace_mode, if the build plan has been prepared (which should be always after at least 1 cargo build routine), sequential execution of cached rustc calls using in-process rustc() executions is performed and preferred. If, while constructing a dirty unit graph, a custom build is detected as dirty, a regular cargo() is performed instead (which also recreates a build plan).

The process roughly looks as follows:

  1. Create a build plan (dependencies, compiler jobs) during cargo() build
  2. When a change has been made, for a given set of modified files, retrieve a set of units (package + crate target kind) that are dirty
  3. For that set of units, walk the reverse dependency graph and transitively mark all required units for rebuilding
  4. Topologically sort those (transitively dirty) units, mapping them to cached compiler calls
  5. Either delegate to cargo() when a TargetKind::CustomBuild unit is dirty or return a list with cached calls in a topological order

There's a minor caveat - with current implementation it's possible for the user to quickly modify and save a file, resulting with an empty set of dirty files, which in turn creates an empty queue of compiler calls. For now, in this case, we just run cargo() instead.

Current heuristic for detecting a set of dirty units based on modified files is as follows:

  1. if the modified file is a build script, we just add that exact unit
  2. Otherwise, a unit is considered dirty, if there is a modified file on the same level or below as the main target source file (e.g. lib.rs, main.rs or other custom one) on the file system

To correctly detect changes based on changed files, the compiler would have to return a set of files that are required for a given package (if the hierarchy isn't directly mapped to a file system, it depends on the files that are include!d and returned by build script etc.).
This also backfires when working on a workspace with its own package, but also containing other member packages. When only a member package is changed, the heuristic always flags the workspace package as dirty, since a file for a member package is in a subdirectory with relation to the member package (e.g. workspace/member_package/src/main.rs is under workspace/).

Things to consider:

  • Investigate reloading analysis for every crate kind and making sure the data is properly connected and works for every package
  • Write workspace mode tests (hard to do for now because of nested workspaces problems in CI Workspace tests #455)
  • Improve dirty files -> dirty units detection heuristic
  • Improve detecting dirty files (we should consider modified-since-last-build and not not-synced-with-disk measure)
  • Probably gating it under unstable_features or similar? Previous iteration was stable, as in didn't cause any hangs - just could provide incorrect/empty data. This iterations needs a bit more ironing, and could potentially be still unstable due to the sequential and greedy (in terms of time) nature (Should spawn a thread?)

@Xanewok Xanewok force-pushed the stage-3 branch 2 times, most recently from e86a393 to ca6fcfe Compare August 28, 2017 12:28
@Xanewok
Copy link
Member Author

Xanewok commented Aug 28, 2017

Depends on #466 for compilation fix.

@Xanewok
Copy link
Member Author

Xanewok commented Aug 28, 2017

Todo:

  • (54ccea6) most specific path prefix for package should improve the file -> package heuristic sufficiently
  • (9853702) hook up to on_change (for file) and incrementally update dirty package graph, only to clear it when the actual build is scheduled
  • investigate reload_from_analysis in rls-analysis for loading multiple analyses

@Xanewok
Copy link
Member Author

Xanewok commented Sep 9, 2017

Introduced tracking dirty files in the scope of the builds in 9853702.
In short,

  1. File, along with its LSP version, will be marked as dirty in on_change
  2. Current dirty files will be cloned over to a PendingBuild that will be later built, build requests are immediately scheduled in the action handler, and since actions are processed sequentially as per LSP spec, no race should occur when it comes to accessing dirty files
  3. Files that are modified while the build building them is running should be still dirty for the next build, so remove only those dirty files, which were built with a version greater or equal to one that's currently marked as dirty (logic for retain() is reversed).

Tested it, seems that regular and modified-while-building scenarios are working correctly and files are dirty files are tracked correctly.
Something that might not be desirable is copying the dirty files hashmap into PendingBuild everytime the build is enqueued, but it's something that'd have to be measured, as it might not be bad at all in terms of performance.
While the 'prepare work for empty modified files' situation has been fixed now, I encountered a weird behaviour: lsp client sent a dummy textDocument/didChange with content_changes: [], not advancing a file version, after a proper textDocument/didChange with changes and updated file version has been sent. This is something to look out for and we might even try to ignore situations where empty content_changes has been received.


EDIT: Actually the successful build can probably clear the dirty files on another thread while one is being scheduled, so there will be more files that will be flagged to built, unnecessarily. While it shouldn't lead to bad behaviour, I'll have to take a look at it still.

One solution would be to fetch them while pending job is popped off the queue and processed (this guarantees that there is no job executed). This also has the benefit of not having to copy the dirty files hashmap everytime a build is requested.

However this still has a data race problem, where a build could be popped off the queue and prepared (with dirty files with versions collected then to use it to clear the final dirty files), while the user still manages to edit a file and bump up the version. It's not a major concern, as the only problem would be that the build doesn't clear a dirty state for a given file and will be pulled for any subsequent build.

In practice this should yield better performance, as this can overshoot only by one file but would do so more often. Additionally the dirty files hashmap would only have to be copied when appropriate, not on every build request (possibly every keystroke?) but it's more spaghetti code probably this way?

@Xanewok
Copy link
Member Author

Xanewok commented Sep 10, 2017

Didn't 100% confirm that, but from what I tried to work on loading multiple analysis at the same time, it seems there maybe be two issues:

  1. Different analyses are emitted for the same krate but with different crate type, so these would have to be merged somehow before loading (so the analysis data has all the symbols for custom build/lib/bin respectively for a given crate, this appears e.g. when working on a bin/lib project)
  2. @nrc I didn't exactly understand analysis architecture, but is it possible that there's a concept of a local/central crate only for which find all refs etc. work? Right now I only tried naively modifying rls_analysis and lowering multiple passed data. When doing so, I got a feeling that for every reloaded crate during multiple target build the analysis kept overwriting each other during lowering. It looked like there was only a single 'active' crate for which analysis data was loaded and accessible, depending on which package was built last in the workspace (this was also the case in a single package bin/lib project, as mentioned above).

@nrc
Copy link
Member

nrc commented Sep 12, 2017

@Xanewok

  1. I don't properly understand - let's say we have a crate with a lib and bin component, the lib will be a dep of the bin. If we make a change to the lib, both will be re-compiled, if we make a change to the bin, only the bin will be. I would imagine that we have analysis data for everything that is re-compiled and we update the saved analysis with that data only.

Are you saying that if we re-compile the bin, then we also have new data for the lib? Or that the data might be incompatible with the old data for the lib?

Or are you saying that rls-analysis needs some way to distinguish different packages for the same crate? So we need to be able to save data for foo-lib as well as foo-bin?

  1. There is no concept of a primary crate in rls-analysis. All crates are basically equal - there is a map from crate name to crate analysis only. However, what might be happening is that there is an implicit ordering. Consider crates A and B where B depends on A - we build A first than B and there can be a reference in B to a def in A, but not the reverse. We take advantage of this when indexing, we assume crates are indexed in order (I think this happens because we sort them by last modification). We only save a reference if there is a def already in our records (we read all defs before refs for an individual crate). If you are somehow reading data out of order, then you might miss out on some refs.

Because of how cached build plan works, under `workspace_mode` every member
package must be forcefully rebuilt to store the arguments Cargo would generate
for the target. This is required when only a certain member package will be
rebuilt initially, but changes made to another one will pull a rebuild of a
package whose rustc call wasn't cached.
Furthermore, sometimes user can quickly input changes and save, before any build
can be made. When this happens and there are no modified files, build plan will
return an empty job queue, since no files are dirty at that time. When that
happens, it delegates to Cargo, just when build scripts are modified. In the
future, detecting dirty files and guessing which files relate to which rebuilt
package is a concrete area for improvement.
Previously only `rustc` was called on a single file or whole `cargo` routine
was executed  to perform the build and since the build plan, cached from the
`cargo` run, is used now in the `workspace_mode` instead, RLS needs to feed
it files changed since last build.
Since while the build is being performed, the user can further modify files
that are currently being built, only files that were modified prior to that
will be marked as clean.
@Xanewok
Copy link
Member Author

Xanewok commented Sep 17, 2017

Just to answer here, the problem is, that the analysis hosts the data in a per-crate manner, while we compile (crate, crate type) units and for each of those we have a separate Analysis data to be loaded. So if we recompile first lib, then bin crate target, then try to load both, then first lib will be loaded, but when loading bin it'll overwrite the lib one, since it occupies the same 'crate' slot.

Or are you saying that rls-analysis needs some way to distinguish different packages for the same crate? So we need to be able to save data for foo-lib as well as foo-bin?

Depending on a package means depending on its lib target, so I think it's good we only save the data for the libs. In every other case (build scripts, bins) we'll build it and pass the data for different targets in-memory, so I think nothing here needs to change.

The only thing that needs changing, I believe, is to support holding different analyses (for each crate target), per each crate.

@nrc nrc merged commit cc07a61 into rust-lang:master Sep 20, 2017
@nrc
Copy link
Member

nrc commented Sep 20, 2017

All looks good! Sorry it took so long to review

@Xanewok Xanewok deleted the stage-3 branch September 24, 2017 21:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants