Use cached build plan mostly instead of Cargo to schedule a multi-target build #462

Xanewok · 2017-08-24T19:11:43Z

WIP: This needs only plugging the logic in the build/mod.rs, which is relatively a small amount of work, but wanted to get a feedback if things are going in a good direction.
@nrc could you take a look and tell me if the overall design is good and if there are any obvious mistakes? I tried to debug and test intermediate functions and behaviour using the RLS and different crates and it works as intended, at least from what I tested on single-package projects.

This probably still will need some tweaking on the vfs side, as vfs.get_changes().iter().map(|(k,_)| k.clone()).collect() seems to retrieve not only dirty files, also disk-synced files, which are only open (which might fool the dirty crate heuristics).

Xanewok · 2017-08-25T15:18:16Z

It works!

At least partially 😄
The diagnostics work, but the analysis data currently looks to be working only for a single (primary?) package. Unfortunately I believe that was also a case previously, when running only cargo() under workspace_mode.

Now under workspace_mode, if the build plan has been prepared (which should be always after at least 1 cargo build routine), sequential execution of cached rustc calls using in-process rustc() executions is performed and preferred. If, while constructing a dirty unit graph, a custom build is detected as dirty, a regular cargo() is performed instead (which also recreates a build plan).

The process roughly looks as follows:

Create a build plan (dependencies, compiler jobs) during cargo() build
When a change has been made, for a given set of modified files, retrieve a set of units (package + crate target kind) that are dirty
For that set of units, walk the reverse dependency graph and transitively mark all required units for rebuilding
Topologically sort those (transitively dirty) units, mapping them to cached compiler calls
Either delegate to cargo() when a TargetKind::CustomBuild unit is dirty or return a list with cached calls in a topological order

There's a minor caveat - with current implementation it's possible for the user to quickly modify and save a file, resulting with an empty set of dirty files, which in turn creates an empty queue of compiler calls. For now, in this case, we just run cargo() instead.

Current heuristic for detecting a set of dirty units based on modified files is as follows:

if the modified file is a build script, we just add that exact unit
Otherwise, a unit is considered dirty, if there is a modified file on the same level or below as the main target source file (e.g. lib.rs, main.rs or other custom one) on the file system

To correctly detect changes based on changed files, the compiler would have to return a set of files that are required for a given package (if the hierarchy isn't directly mapped to a file system, it depends on the files that are include!d and returned by build script etc.).
This also backfires when working on a workspace with its own package, but also containing other member packages. When only a member package is changed, the heuristic always flags the workspace package as dirty, since a file for a member package is in a subdirectory with relation to the member package (e.g. workspace/member_package/src/main.rs is under workspace/).

Things to consider:

Investigate reloading analysis for every crate kind and making sure the data is properly connected and works for every package
Write workspace mode tests (hard to do for now because of nested workspaces problems in CI Workspace tests #455)
Improve dirty files -> dirty units detection heuristic
Improve detecting dirty files (we should consider modified-since-last-build and not not-synced-with-disk measure)
Probably gating it under unstable_features or similar? Previous iteration was stable, as in didn't cause any hangs - just could provide incorrect/empty data. This iterations needs a bit more ironing, and could potentially be still unstable due to the sequential and greedy (in terms of time) nature (Should spawn a thread?)

Xanewok · 2017-08-28T12:28:44Z

Depends on #466 for compilation fix.

Xanewok · 2017-08-28T21:54:38Z

Todo:

(54ccea6) most specific path prefix for package should improve the file -> package heuristic sufficiently
(9853702) hook up to on_change (for file) and incrementally update dirty package graph, only to clear it when the actual build is scheduled
investigate reload_from_analysis in rls-analysis for loading multiple analyses

Xanewok · 2017-09-09T17:27:06Z

Introduced tracking dirty files in the scope of the builds in 9853702.
In short,

File, along with its LSP version, will be marked as dirty in on_change
Current dirty files will be cloned over to a PendingBuild that will be later built, build requests are immediately scheduled in the action handler, and since actions are processed sequentially as per LSP spec, no race should occur when it comes to accessing dirty files
Files that are modified while the build building them is running should be still dirty for the next build, so remove only those dirty files, which were built with a version greater or equal to one that's currently marked as dirty (logic for retain() is reversed).

Tested it, seems that regular and modified-while-building scenarios are working correctly and files are dirty files are tracked correctly.
Something that might not be desirable is copying the dirty files hashmap into PendingBuild everytime the build is enqueued, but it's something that'd have to be measured, as it might not be bad at all in terms of performance.
While the 'prepare work for empty modified files' situation has been fixed now, I encountered a weird behaviour: lsp client sent a dummy textDocument/didChange with content_changes: [], not advancing a file version, after a proper textDocument/didChange with changes and updated file version has been sent. This is something to look out for and we might even try to ignore situations where empty content_changes has been received.

EDIT: Actually the successful build can probably clear the dirty files on another thread while one is being scheduled, so there will be more files that will be flagged to built, unnecessarily. While it shouldn't lead to bad behaviour, I'll have to take a look at it still.

One solution would be to fetch them while pending job is popped off the queue and processed (this guarantees that there is no job executed). This also has the benefit of not having to copy the dirty files hashmap everytime a build is requested.

However this still has a data race problem, where a build could be popped off the queue and prepared (with dirty files with versions collected then to use it to clear the final dirty files), while the user still manages to edit a file and bump up the version. It's not a major concern, as the only problem would be that the build doesn't clear a dirty state for a given file and will be pulled for any subsequent build.

In practice this should yield better performance, as this can overshoot only by one file but would do so more often. Additionally the dirty files hashmap would only have to be copied when appropriate, not on every build request (possibly every keystroke?) but it's more spaghetti code probably this way?

Xanewok · 2017-09-10T20:57:53Z

Didn't 100% confirm that, but from what I tried to work on loading multiple analysis at the same time, it seems there maybe be two issues:

Different analyses are emitted for the same krate but with different crate type, so these would have to be merged somehow before loading (so the analysis data has all the symbols for custom build/lib/bin respectively for a given crate, this appears e.g. when working on a bin/lib project)
@nrc I didn't exactly understand analysis architecture, but is it possible that there's a concept of a local/central crate only for which find all refs etc. work? Right now I only tried naively modifying rls_analysis and lowering multiple passed data. When doing so, I got a feeling that for every reloaded crate during multiple target build the analysis kept overwriting each other during lowering. It looked like there was only a single 'active' crate for which analysis data was loaded and accessible, depending on which package was built last in the workspace (this was also the case in a single package bin/lib project, as mentioned above).

nrc · 2017-09-12T05:19:29Z

@Xanewok

I don't properly understand - let's say we have a crate with a lib and bin component, the lib will be a dep of the bin. If we make a change to the lib, both will be re-compiled, if we make a change to the bin, only the bin will be. I would imagine that we have analysis data for everything that is re-compiled and we update the saved analysis with that data only.

Are you saying that if we re-compile the bin, then we also have new data for the lib? Or that the data might be incompatible with the old data for the lib?

Or are you saying that rls-analysis needs some way to distinguish different packages for the same crate? So we need to be able to save data for foo-lib as well as foo-bin?

There is no concept of a primary crate in rls-analysis. All crates are basically equal - there is a map from crate name to crate analysis only. However, what might be happening is that there is an implicit ordering. Consider crates A and B where B depends on A - we build A first than B and there can be a reference in B to a def in A, but not the reverse. We take advantage of this when indexing, we assume crates are indexed in order (I think this happens because we sort them by last modification). We only save a reference if there is a def already in our records (we read all defs before refs for an individual crate). If you are somehow reading data out of order, then you might miss out on some refs.

Because of how cached build plan works, under `workspace_mode` every member package must be forcefully rebuilt to store the arguments Cargo would generate for the target. This is required when only a certain member package will be rebuilt initially, but changes made to another one will pull a rebuild of a package whose rustc call wasn't cached. Furthermore, sometimes user can quickly input changes and save, before any build can be made. When this happens and there are no modified files, build plan will return an empty job queue, since no files are dirty at that time. When that happens, it delegates to Cargo, just when build scripts are modified. In the future, detecting dirty files and guessing which files relate to which rebuilt package is a concrete area for improvement.

Previously only `rustc` was called on a single file or whole `cargo` routine was executed to perform the build and since the build plan, cached from the `cargo` run, is used now in the `workspace_mode` instead, RLS needs to feed it files changed since last build. Since while the build is being performed, the user can further modify files that are currently being built, only files that were modified prior to that will be marked as clean.

Xanewok · 2017-09-17T15:23:34Z

Just to answer here, the problem is, that the analysis hosts the data in a per-crate manner, while we compile (crate, crate type) units and for each of those we have a separate Analysis data to be loaded. So if we recompile first lib, then bin crate target, then try to load both, then first lib will be loaded, but when loading bin it'll overwrite the lib one, since it occupies the same 'crate' slot.

Or are you saying that rls-analysis needs some way to distinguish different packages for the same crate? So we need to be able to save data for foo-lib as well as foo-bin?

Depending on a package means depending on its lib target, so I think it's good we only save the data for the libs. In every other case (build scripts, bins) we'll build it and pass the data for different targets in-memory, so I think nothing here needs to change.

The only thing that needs changing, I believe, is to support holding different analyses (for each crate target), per each crate.

nrc · 2017-09-20T23:16:16Z

All looks good! Sorry it took so long to review

Xanewok force-pushed the stage-3 branch 2 times, most recently from e86a393 to ca6fcfe Compare August 28, 2017 12:28

Xanewok force-pushed the stage-3 branch from ca6fcfe to dc8f989 Compare August 28, 2017 20:18

Xanewok force-pushed the stage-3 branch 2 times, most recently from cefa149 to 54ccea6 Compare September 5, 2017 13:02

Xanewok force-pushed the stage-3 branch from 54ccea6 to 9853702 Compare September 9, 2017 17:05

Xanewok added 7 commits September 14, 2017 19:11

Add reverse dependency graph to the build plan

a685a92

Create a dirty unit dep graph based on currently modified files

aeb6d04

Topologically sort dirty dependencies and create a linear rustc queue

a8a3d0c

Use most specified path prefix to determine dirty package

3bdfbc1

Pass cached job executable when executing rustc via job queue

c03ea7b

Xanewok force-pushed the stage-3 branch from 9853702 to 5ff56f0 Compare September 14, 2017 17:54

nrc merged commit cc07a61 into rust-lang:master Sep 20, 2017

Xanewok deleted the stage-3 branch September 24, 2017 21:15

Xanewok mentioned this pull request Oct 2, 2017

Panic when in a Cargo workspace #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use cached build plan mostly instead of Cargo to schedule a multi-target build #462

Use cached build plan mostly instead of Cargo to schedule a multi-target build #462

Uh oh!

Xanewok commented Aug 24, 2017

Uh oh!

Xanewok commented Aug 25, 2017

Uh oh!

Xanewok commented Aug 28, 2017

Uh oh!

Xanewok commented Aug 28, 2017 •

edited

Loading

Uh oh!

Xanewok commented Sep 9, 2017 •

edited

Loading

Uh oh!

Xanewok commented Sep 10, 2017

Uh oh!

nrc commented Sep 12, 2017

Uh oh!

Xanewok commented Sep 17, 2017

Uh oh!

nrc commented Sep 20, 2017

Uh oh!

Uh oh!

Use cached build plan mostly instead of Cargo to schedule a multi-target build #462

Use cached build plan mostly instead of Cargo to schedule a multi-target build #462

Uh oh!

Conversation

Xanewok commented Aug 24, 2017

Uh oh!

Xanewok commented Aug 25, 2017

Uh oh!

Xanewok commented Aug 28, 2017

Uh oh!

Xanewok commented Aug 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xanewok commented Sep 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xanewok commented Sep 10, 2017

Uh oh!

nrc commented Sep 12, 2017

Uh oh!

Xanewok commented Sep 17, 2017

Uh oh!

nrc commented Sep 20, 2017

Uh oh!

Uh oh!

Xanewok commented Aug 28, 2017 •

edited

Loading

Xanewok commented Sep 9, 2017 •

edited

Loading