Skip to content

Commit c2252da

Browse files
authored
More improvements to documentation of fine-grained incremental mode (#4446)
Improve documentation of fine-grained dependencies and fine-grained incremental updates.
1 parent 4fda7c4 commit c2252da

File tree

8 files changed

+204
-36
lines changed

8 files changed

+204
-36
lines changed

mypy/server/deps.py

Lines changed: 80 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,83 @@
1-
"""Generate fine-grained dependencies for AST nodes."""
1+
"""Generate fine-grained dependencies for AST nodes, for use in the daemon mode.
2+
3+
Dependencies are stored in a map from *triggers* to *sets of affected locations*.
4+
5+
A trigger is a string that represents a program property that has changed, such
6+
as the signature of a specific function. Triggers are written as '<...>' (angle
7+
brackets). When a program property changes, we determine the relevant trigger(s)
8+
and all affected locations. The latter are stale and will have to be reprocessed.
9+
10+
An affected location is a string than can refer to a *target* (a non-nested
11+
function or method, or a module top level), a class, or a trigger (for
12+
recursively triggering other triggers).
13+
14+
Here's an example represention of a simple dependency map (in format
15+
"<trigger> -> locations"):
16+
17+
<m.A.g> -> m.f
18+
<m.A> -> <m.f>, m.A, m.f
19+
20+
Assuming 'A' is a class, this means that
21+
22+
1) if a property of 'm.A.g', such as the signature, is changed, we need
23+
to process target (function) 'm.f'
24+
25+
2) if the MRO or other significant property of class 'm.A' changes, we
26+
need to process target 'm.f', the entire class 'm.A', and locations
27+
triggered by trigger '<m.f>' (this explanation is a bit simplified;
28+
see below for more details).
29+
30+
The triggers to fire are determined using mypy.server.astdiff.
31+
32+
Examples of triggers:
33+
34+
* '<mod.x>' represents a module attribute/function/class. If any externally
35+
visible property of 'x' changes, this gets fired. For changes within
36+
classes, only "big" changes cause the class to be triggered (such as a
37+
change in MRO). Smaller changes, such as changes to some attributes, don't
38+
trigger the entire class.
39+
* '<mod.Cls.x>' represents the type and kind of attribute/method 'x' of
40+
class 'mod.Cls'. This can also refer to an attribute inherited from a
41+
base class (relevant if it's accessed through a value of type 'Cls'
42+
instead of the base class type).
43+
* '<package.mod>' represents the existence of module 'package.mod'. This
44+
gets triggered if 'package.mod' is created or deleted, or if it gets
45+
changed into something other than a module.
46+
47+
Examples of locations:
48+
49+
* 'mod' is the top level of module 'mod' (doesn't include any function bodies,
50+
but includes class bodies not nested within a function).
51+
* 'mod.f' is function 'f' in module 'mod' (module-level variables aren't separate
52+
locations but are included in the module top level). Functions also include
53+
any nested functions and classes -- such nested definitions aren't separate
54+
locations, for simplicity of implementation.
55+
* 'mod.Cls.f' is method 'f' of 'mod.Cls'. Non-method attributes aren't locations.
56+
* 'mod.Cls' represents each method in class 'mod.Cls' + the top-level of the
57+
module 'mod'. (To simplify the implementation, there is no location that only
58+
includes the body of a class without the entire surrounding module top level.)
59+
* Trigger '<...>' as a location is an indirect way of referring to to all
60+
locations triggered by the trigger. These indirect locations keep the
61+
dependency map smaller and easier to manage.
62+
63+
Triggers can be triggered by program changes such as these:
64+
65+
* Addition or deletion of an attribute (or module).
66+
* Change of the kind of thing a name represents (such as a change from a function
67+
to a class).
68+
* Change of the static type of a name.
69+
70+
Changes in the body of a function that aren't reflected in the signature don't
71+
cause the function to be triggered. More generally, we trigger only on changes
72+
that may affect type checking results outside the module that contains the
73+
change.
74+
75+
We don't generate dependencies from builtins and certain other stdlib modules,
76+
since these change very rarely, and they would just increase the size of the
77+
dependency map significantly without significant benefit.
78+
79+
Test cases for this module live in 'test-data/unit/deps*.test'.
80+
"""
281

382
from typing import Dict, List, Set, Optional, Tuple, Union
483

mypy/server/update.py

Lines changed: 103 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,119 @@
1-
"""Update build result by incrementally processing changed modules.
1+
"""Update build by processing changes using fine-grained dependencies.
22
33
Use fine-grained dependencies to update targets in other modules that
44
may be affected by externally-visible changes in the changed modules.
55
6-
Terms:
6+
This forms the core of the fine-grained incremental daemon mode. This
7+
module is not used at all by the 'classic' (non-daemon) incremental
8+
mode.
79
8-
* A 'target' is a function definition or the top level of a module. We
9-
refer to targets using their fully qualified name (e.g. 'mod.Cls.attr').
10-
Targets are the smallest units of processing during fine-grained
11-
incremental checking.
12-
* A 'trigger' represents the properties of a part of a program, and it
13-
gets triggered/activated when these properties change. For example,
14-
'<mod.func>' refers to a module-level function, and it gets triggered
15-
if the signature of the function changes, or if if the function is
16-
removed.
10+
Here is some motivation for this mode:
1711
18-
Some program state is maintained across multiple build increments:
12+
* By keeping program state in memory between incremental runs, we
13+
only have to process changed modules, not their dependencies. The
14+
classic incremental mode has to deserialize the symbol tables of
15+
all dependencies of changed modules, which can be slow for large
16+
programs.
1917
20-
* The full ASTs of all modules in memory all the time (+ type map).
21-
* Maintain a fine-grained dependency map, which is from triggers to
22-
targets/triggers. The latter determine what other parts of a program
23-
need to be processed again due to an externally visible change to a
24-
module.
18+
* Fine-grained dependencies allow processing only the relevant parts
19+
of modules indirectly affected by a change. Say, if only one function
20+
in a large module is affected by a change in another module, only this
21+
function is processed. The classic incremental mode always processes
22+
an entire file as a unit, which is typically much slower.
2523
26-
We perform a fine-grained incremental program update like this:
24+
* It's possible to independently process individual modules within an
25+
import cycle (SCC). Small incremental changes can be fast independent
26+
of the size of the related SCC. In classic incremental mode, any change
27+
within a SCC requires the entire SCC to be processed, which can slow
28+
things down considerably.
29+
30+
Some terms:
31+
32+
* A *target* is a function/method definition or the top level of a module.
33+
We refer to targets using their fully qualified name (e.g.
34+
'mod.Cls.method'). Targets are the smallest units of processing during
35+
fine-grained incremental checking.
36+
37+
* A *trigger* represents the properties of a part of a program, and it
38+
gets triggered/fired when these properties change. For example,
39+
'<mod.func>' refers to a module-level function. It gets triggered if
40+
the signature of the function changes, or if the function is removed,
41+
for example.
42+
43+
Some program state is maintained across multiple build increments in
44+
memory:
45+
46+
* The full ASTs of all modules are stored in memory all the time (this
47+
includes the type map).
48+
49+
* A fine-grained dependency map is maintained, which maps triggers to
50+
affected program locations (these can be targets, triggers, or
51+
classes). The latter determine what other parts of a program need to
52+
be processed again due to a fired trigger.
53+
54+
Here's a summary of how a fine-grained incremental program update happens:
2755
2856
* Determine which modules have changes in their source code since the
29-
previous build.
30-
* Fully process these modules, creating new ASTs and symbol tables
31-
for them. Retain the existing ASTs and symbol tables of modules that
32-
have no changes in their source code.
33-
* Determine which parts of the changed modules have changed. The result
34-
is a set of triggered triggers.
35-
* Using the dependency map, decide which other targets have become
36-
stale and need to be reprocessed.
37-
* Replace old ASTs of the modules that we reprocessed earlier with
38-
the new ones, but try to retain the identities of original externally
39-
visible AST nodes so that we don't (always) need to patch references
40-
in the rest of the program.
41-
* Semantically analyze and type check the stale targets.
42-
* Repeat the previous steps until nothing externally visible has changed.
57+
previous update.
58+
59+
* Process changed modules one at a time. Perform a separate full update
60+
for each changed module, but only report the errors after all modules
61+
have been processed, since the intermediate states can generate bogus
62+
errors due to only seeing a partial set of changes.
63+
64+
* Each changed module is processed in full. We parse the module, and
65+
run semantic analysis to create a new AST and symbol table for the
66+
module. Reuse the existing ASTs and symbol tables of modules that
67+
have no changes in their source code. At the end of this stage, we have
68+
two ASTs and symbol tables for the changed module (the old and the new
69+
versions). The latter AST has not yet been type checked.
70+
71+
* Take a snapshot of the old symbol table. This is used later to determine
72+
which properties of the module have changed and which triggers to fire.
73+
74+
* Merge the old AST with the new AST, preserving the identities of
75+
externally visible AST nodes for which we can find a corresponding node
76+
in the new AST. (Look at mypy.server.astmerge for the details.) This
77+
way all external references to AST nodes in the changed module will
78+
continue to point to the right nodes (assuming they still have a valid
79+
target).
80+
81+
* Type check the new module.
82+
83+
* Take another snapshot of the symbol table of the changed module.
84+
Look at the differences between the old and new snapshots to determine
85+
which parts of the changed modules have changed. The result is a set of
86+
fired triggers.
87+
88+
* Using the dependency map and the fired triggers, decide which other
89+
targets have become stale and need to be reprocessed.
90+
91+
* Create new fine-grained dependencies for the changed module. We don't
92+
garbage collect old dependencies, since extra dependencies are relatively
93+
harmless (they take some memory and can theoretically slow things down
94+
a bit by causing redundant work). This is implemented in
95+
mypy.server.deps.
96+
97+
* Strip the stale AST nodes that we found above. This returns them to a
98+
state resembling the end of semantic analysis pass 1. We'll run semantic
99+
analysis again on the existing AST nodes, and since semantic analysis
100+
is not idempotent, we need to revert some changes made during semantic
101+
analysis. This is implemented in mypy.server.aststrip.
102+
103+
* Run semantic analyzer passes 2 and 3 on the stale AST nodes, and type
104+
check them. We also need to do the symbol table snapshot comparison
105+
dance to find any changes, and we need to merge ASTs to preserve AST node
106+
identities.
107+
108+
* If some triggers haven been fired, continue processing and repeat the
109+
previous steps until no triggers are fired.
110+
111+
This is module is tested using end-to-end fine-grained incremental mode
112+
test cases (test-data/unit/fine-grained*.test).
43113
44114
Major todo items:
45115
46-
- Support multiple type checking passes
116+
- Fully support multiple type checking passes
47117
"""
48118

49119
import os.path

test-data/unit/deps-classes.test

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
-- Test cases for generating fine-grained dependencies for classes.
22
--
33
-- The dependencies are used for fined-grained incremental checking.
4+
--
5+
-- See the comment at the top of deps.test for more documentation.
46

57
-- TODO: Move class related test cases from deps.test to here
68

test-data/unit/deps-expressions.test

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
-- Test cases for generating fine-grained dependencies for expressions.
22
--
33
-- The dependencies are used for fined-grained incremental checking.
4+
--
5+
-- See the comment at the top of deps.test for more documentation.
46

57
[case testListExpr]
68
def f() -> int: pass

test-data/unit/deps-generics.test

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
-- Test cases for generating fine-grained dependencies involving generics.
22
--
33
-- The dependencies are used for fined-grained incremental checking.
4+
--
5+
-- See the comment at the top of deps.test for more documentation.
46

57
[case testGenericFunction]
68
from typing import TypeVar

test-data/unit/deps-statements.test

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
-- Test cases for generating fine-grained dependencies for statements.
22
--
33
-- The dependencies are used for fined-grained incremental checking.
4+
--
5+
-- See the comment at the top of deps.test for more documentation.
46

57
[case testIfStmt]
68
def f1() -> int: pass

test-data/unit/deps-types.test

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
-- Test cases for generating fine-grained dependencies between types.
22
--
33
-- The dependencies are used for fined-grained incremental checking.
4+
--
5+
-- See the comment at the top of deps.test for more documentation.
46

57
[case testFilterOutBuiltInTypes]
68
class A: pass

test-data/unit/deps.test

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,16 @@
11
-- Test cases for generating dependencies between ASTs nodes.
22
--
3-
-- The dependencies are used for fined-grained incremental checking.
4-
3+
-- The dependencies are used for fined-grained incremental checking and
4+
-- the daemon mode.
5+
--
6+
-- The output of each test case includes the dependency map for whitelisted
7+
-- modules (includes the main module and the modules 'pkg' and 'pkg.mod' at
8+
-- least).
9+
--
10+
-- Dependencies are formatted as "<trigger> -> affected locations".
11+
--
12+
-- Look at the docstring of mypy.server.deps for an explanation of
13+
-- how fine-grained dependencies are represented.
514

615
[case testCallFunction]
716
def f() -> None:

0 commit comments

Comments
 (0)