More improvements to documentation of fine-grained incremental mode (#4446)

JukkaL · web-flow · commit c2252dafd423 · 2018-01-10T12:24:07.000Z
Improve documentation of fine-grained dependencies and
fine-grained incremental updates.
diff --git a/mypy/server/deps.py b/mypy/server/deps.py
@@ -1,4 +1,83 @@
-"""Generate fine-grained dependencies for AST nodes."""
+"""Generate fine-grained dependencies for AST nodes, for use in the daemon mode.
+
+Dependencies are stored in a map from *triggers* to *sets of affected locations*.
+
+A trigger is a string that represents a program property that has changed, such
+as the signature of a specific function. Triggers are written as '<...>' (angle
+brackets). When a program property changes, we determine the relevant trigger(s)
+and all affected locations. The latter are stale and will have to be reprocessed.
+
+An affected location is a string than can refer to a *target* (a non-nested
+function or method, or a module top level), a class, or a trigger (for
+recursively triggering other triggers).
+
+Here's an example represention of a simple dependency map (in format
+"<trigger> -> locations"):
+
+  <m.A.g> -> m.f
+  <m.A> -> <m.f>, m.A, m.f
+
+Assuming 'A' is a class, this means that
+
+1) if a property of 'm.A.g', such as the signature, is changed, we need
+   to process target (function) 'm.f'
+
+2) if the MRO or other significant property of class 'm.A' changes, we
+   need to process target 'm.f', the entire class 'm.A', and locations
+   triggered by trigger '<m.f>' (this explanation is a bit simplified;
+   see below for more details).
+
+The triggers to fire are determined using mypy.server.astdiff.
+
+Examples of triggers:
+
+* '<mod.x>' represents a module attribute/function/class. If any externally
+  visible property of 'x' changes, this gets fired. For changes within
+  classes, only "big" changes cause the class to be triggered (such as a
+  change in MRO). Smaller changes, such as changes to some attributes, don't
+  trigger the entire class.
+* '<mod.Cls.x>' represents the type and kind of attribute/method 'x' of
+  class 'mod.Cls'. This can also refer to an attribute inherited from a
+  base class (relevant if it's accessed through a value of type 'Cls'
+  instead of the base class type).
+* '<package.mod>' represents the existence of module 'package.mod'. This
+  gets triggered if 'package.mod' is created or deleted, or if it gets
+  changed into something other than a module.
+
+Examples of locations:
+
+* 'mod' is the top level of module 'mod' (doesn't include any function bodies,
+  but includes class bodies not nested within a function).
+* 'mod.f' is function 'f' in module 'mod' (module-level variables aren't separate
+  locations but are included in the module top level). Functions also include
+  any nested functions and classes -- such nested definitions aren't separate
+  locations, for simplicity of implementation.
+* 'mod.Cls.f' is method 'f' of 'mod.Cls'. Non-method attributes aren't locations.
+* 'mod.Cls' represents each method in class 'mod.Cls' + the top-level of the
+  module 'mod'. (To simplify the implementation, there is no location that only
+  includes the body of a class without the entire surrounding module top level.)
+* Trigger '<...>' as a location is an indirect way of referring to to all
+  locations triggered by the trigger. These indirect locations keep the
+  dependency map smaller and easier to manage.
+
+Triggers can be triggered by program changes such as these:
+
+* Addition or deletion of an attribute (or module).
+* Change of the kind of thing a name represents (such as a change from a function
+  to a class).
+* Change of the static type of a name.
+
+Changes in the body of a function that aren't reflected in the signature don't
+cause the function to be triggered. More generally, we trigger only on changes
+that may affect type checking results outside the module that contains the
+change.
+
+We don't generate dependencies from builtins and certain other stdlib modules,
+since these change very rarely, and they would just increase the size of the
+dependency map significantly without significant benefit.
+
+Test cases for this module live in 'test-data/unit/deps*.test'.
+"""
 
 from typing import Dict, List, Set, Optional, Tuple, Union
 
diff --git a/mypy/server/update.py b/mypy/server/update.py
@@ -1,49 +1,119 @@
-"""Update build result by incrementally processing changed modules.
+"""Update build by processing changes using fine-grained dependencies.
 
 Use fine-grained dependencies to update targets in other modules that
 may be affected by externally-visible changes in the changed modules.
 
-Terms:
+This forms the core of the fine-grained incremental daemon mode. This
+module is not used at all by the 'classic' (non-daemon) incremental
+mode.
 
-* A 'target' is a function definition or the top level of a module. We
-  refer to targets using their fully qualified name (e.g. 'mod.Cls.attr').
-  Targets are the smallest units of processing during fine-grained
-  incremental checking.
-* A 'trigger' represents the properties of a part of a program, and it
-  gets triggered/activated when these properties change. For example,
-  '<mod.func>' refers to a module-level function, and it gets triggered
-  if the signature of the function changes, or if if the function is
-  removed.
+Here is some motivation for this mode:
 
-Some program state is maintained across multiple build increments:
+* By keeping program state in memory between incremental runs, we
+  only have to process changed modules, not their dependencies. The
+  classic incremental mode has to deserialize the symbol tables of
+  all dependencies of changed modules, which can be slow for large
+  programs.
 
-* The full ASTs of all modules in memory all the time (+ type map).
-* Maintain a fine-grained dependency map, which is from triggers to
-  targets/triggers. The latter determine what other parts of a program
-  need to be processed again due to an externally visible change to a
-  module.
+* Fine-grained dependencies allow processing only the relevant parts
+  of modules indirectly affected by a change. Say, if only one function
+  in a large module is affected by a change in another module, only this
+  function is processed. The classic incremental mode always processes
+  an entire file as a unit, which is typically much slower.
 
-We perform a fine-grained incremental program update like this:
+* It's possible to independently process individual modules within an
+  import cycle (SCC). Small incremental changes can be fast independent
+  of the size of the related SCC. In classic incremental mode, any change
+  within a SCC requires the entire SCC to be processed, which can slow
+  things down considerably.
+
+Some terms:
+
+* A *target* is a function/method definition or the top level of a module.
+  We refer to targets using their fully qualified name (e.g.
+  'mod.Cls.method'). Targets are the smallest units of processing during
+  fine-grained incremental checking.
+
+* A *trigger* represents the properties of a part of a program, and it
+  gets triggered/fired when these properties change. For example,
+  '<mod.func>' refers to a module-level function. It gets triggered if
+  the signature of the function changes, or if the function is removed,
+  for example.
+
+Some program state is maintained across multiple build increments in
+memory:
+
+* The full ASTs of all modules are stored in memory all the time (this
+  includes the type map).
+
+* A fine-grained dependency map is maintained, which maps triggers to
+  affected program locations (these can be targets, triggers, or
+  classes). The latter determine what other parts of a program need to
+  be processed again due to a fired trigger.
+
+Here's a summary of how a fine-grained incremental program update happens:
 
 * Determine which modules have changes in their source code since the
-  previous build.
-* Fully process these modules, creating new ASTs and symbol tables
-  for them. Retain the existing ASTs and symbol tables of modules that
-  have no changes in their source code.
-* Determine which parts of the changed modules have changed. The result
-  is a set of triggered triggers.
-* Using the dependency map, decide which other targets have become
-  stale and need to be reprocessed.
-* Replace old ASTs of the modules that we reprocessed earlier with
-  the new ones, but try to retain the identities of original externally
-  visible AST nodes so that we don't (always) need to patch references
-  in the rest of the program.
-* Semantically analyze and type check the stale targets.
-* Repeat the previous steps until nothing externally visible has changed.
+  previous update.
+
+* Process changed modules one at a time. Perform a separate full update
+  for each changed module, but only report the errors after all modules
+  have been processed, since the intermediate states can generate bogus
+  errors due to only seeing a partial set of changes.
+
+* Each changed module is processed in full. We parse the module, and
+  run semantic analysis to create a new AST and symbol table for the
+  module. Reuse the existing ASTs and symbol tables of modules that
+  have no changes in their source code. At the end of this stage, we have
+  two ASTs and symbol tables for the changed module (the old and the new
+  versions). The latter AST has not yet been type checked.
+
+* Take a snapshot of the old symbol table. This is used later to determine
+  which properties of the module have changed and which triggers to fire.
+
+* Merge the old AST with the new AST, preserving the identities of
+  externally visible AST nodes for which we can find a corresponding node
+  in the new AST. (Look at mypy.server.astmerge for the details.) This
+  way all external references to AST nodes in the changed module will
+  continue to point to the right nodes (assuming they still have a valid
+  target).
+
+* Type check the new module.
+
+* Take another snapshot of the symbol table of the changed module.
+  Look at the differences between the old and new snapshots to determine
+  which parts of the changed modules have changed. The result is a set of
+  fired triggers.
+
+* Using the dependency map and the fired triggers, decide which other
+  targets have become stale and need to be reprocessed.
+
+* Create new fine-grained dependencies for the changed module. We don't
+  garbage collect old dependencies, since extra dependencies are relatively
+  harmless (they take some memory and can theoretically slow things down
+  a bit by causing redundant work). This is implemented in
+  mypy.server.deps.
+
+* Strip the stale AST nodes that we found above. This returns them to a
+  state resembling the end of semantic analysis pass 1. We'll run semantic
+  analysis again on the existing AST nodes, and since semantic analysis
+  is not idempotent, we need to revert some changes made during semantic
+  analysis. This is implemented in mypy.server.aststrip.
+
+* Run semantic analyzer passes 2 and 3 on the stale AST nodes, and type
+  check them. We also need to do the symbol table snapshot comparison
+  dance to find any changes, and we need to merge ASTs to preserve AST node
+  identities.
+
+* If some triggers haven been fired, continue processing and repeat the
+  previous steps until no triggers are fired.
+
+This is module is tested using end-to-end fine-grained incremental mode
+test cases (test-data/unit/fine-grained*.test).
 
 Major todo items:
 
-- Support multiple type checking passes
+- Fully support multiple type checking passes
 """
 
 import os.path
diff --git a/test-data/unit/deps-classes.test b/test-data/unit/deps-classes.test
@@ -1,6 +1,8 @@
 -- Test cases for generating fine-grained dependencies for classes.
 --
 -- The dependencies are used for fined-grained incremental checking.
+--
+-- See the comment at the top of deps.test for more documentation.
 
 -- TODO: Move class related test cases from deps.test to here
 
diff --git a/test-data/unit/deps-expressions.test b/test-data/unit/deps-expressions.test
@@ -1,6 +1,8 @@
 -- Test cases for generating fine-grained dependencies for expressions.
 --
 -- The dependencies are used for fined-grained incremental checking.
+--
+-- See the comment at the top of deps.test for more documentation.
 
 [case testListExpr]
 def f() -> int: pass
diff --git a/test-data/unit/deps-generics.test b/test-data/unit/deps-generics.test
@@ -1,6 +1,8 @@
 -- Test cases for generating fine-grained dependencies involving generics.
 --
 -- The dependencies are used for fined-grained incremental checking.
+--
+-- See the comment at the top of deps.test for more documentation.
 
 [case testGenericFunction]
 from typing import TypeVar
diff --git a/test-data/unit/deps-statements.test b/test-data/unit/deps-statements.test
@@ -1,6 +1,8 @@
 -- Test cases for generating fine-grained dependencies for statements.
 --
 -- The dependencies are used for fined-grained incremental checking.
+--
+-- See the comment at the top of deps.test for more documentation.
 
 [case testIfStmt]
 def f1() -> int: pass
diff --git a/test-data/unit/deps-types.test b/test-data/unit/deps-types.test
@@ -1,6 +1,8 @@
 -- Test cases for generating fine-grained dependencies between types.
 --
 -- The dependencies are used for fined-grained incremental checking.
+--
+-- See the comment at the top of deps.test for more documentation.
 
 [case testFilterOutBuiltInTypes]
 class A: pass
diff --git a/test-data/unit/deps.test b/test-data/unit/deps.test
@@ -1,7 +1,16 @@
 -- Test cases for generating dependencies between ASTs nodes.
 --
--- The dependencies are used for fined-grained incremental checking.
-
+-- The dependencies are used for fined-grained incremental checking and
+-- the daemon mode.
+--
+-- The output of each test case includes the dependency map for whitelisted
+-- modules (includes the main module and the modules 'pkg' and 'pkg.mod' at
+-- least).
+--
+-- Dependencies are formatted as "<trigger> -> affected locations".
+--
+-- Look at the docstring of mypy.server.deps for an explanation of
+-- how fine-grained dependencies are represented.
 
 [case testCallFunction]
 def f() -> None: