Elastic builder (#721)

mjwen · Jason Munro · web-flow · commit 23dd61de15ca · 2023-05-12T09:51:12.000-07:00
* Init elastic doc

* Add typing and description all fields

* Add builder for elasticity

* Fix task_id from str to int; a working Builder!!!

* Commented out unnecessary functions

* Use elastic doc in building

* Reorganize elastic doc to make derived property a separate field

* Add structure to elastic doc

* Group by lattice and filter incar params

* Add more fields to elasticity doc

* Update ElasticityDoc to standardize fields

* Rename docs

* Add filters to elastic builder

* Finished version of elastic builder

* Use emmet jsanitize instead of that from monty

* Fix typos

* Add state and warnings

* Add origins

* Add connect() in process_item to make `mrun` work

* Move query from tasks of the same material to get_items()

* Move task label below

* Update a few docstrings

* Make `fitting_method` an argument of builder

* Fix mypy warnings

* Split derived properties into multiple subcategories

* Move Status to common.py to share it

* Fix vector and matrix 3d to be tuples

* Move all fitting code from elastic builder to doc

* Update builder to use new doc

* Fix not invertable compliance tensor

* Fix error message

* Add filter by calc type and filter by incar settings

* Finish builder

* Fix calc_types

* Delete old unused functions

* Gather warnings/critical together

* Put fit tensor and compliance tensor in different try block

* Add optimized structure to elastic doc

* Fix: add tol=0.002 for deform independence, and is_upper_triangular check

* Fix tolerance to 0.002 for deformation comparison

* Fix units for Young's modulus

* Fix group by lattice docs x

* Add total number of strain stress states to fitting data

* Early return if task type does not match

* Major: change to get derived data using strain from deform, to get away checking upper and lower triangular

* Update warning message

* Better error message

* For failed status ones, set deprecated to True

* Add more docstring

* Add elasticiy core test files

* Add tests for elasticity doc

* Fix materials builder to use input.structure for deformation task

* Add elastic builder test

* Cleanup docstring of elastic doc

* Set material id to the id of the optimization task used for fitting the data

* Fix using undeformed structure in material builder

* String format

* Add material_id index to elasticity collection

* Reverse the deformations to get undeformed structure in MaterialBuilder

* Fix ValueEnum import error

* Fix material_id

* Adjust warning messages, specifically remove unnecessary checking for elastic stability

* Fix typing for mypy

* Fix mypy errors

* Average stress using symmops before fitting, to avoid different stress on sym-related strains

* Use 2nd PK stress for fitting, replacing Cauchy stress because we use the work-conjugate Lagrange strain. Influence on the fitted tensor is minimal at small strains

* Add more detailed explanation of the building steps

* Fix docs

* Fix missing List typing in math.py

* Rerun precommit

* Fix mypy types

---------

Co-authored-by: Jason Munro &lt;jmunro@lbl.gov&gt;
diff --git a/emmet-builders/emmet/builders/materials/elasticity.py b/emmet-builders/emmet/builders/materials/elasticity.py
@@ -2,16 +2,19 @@
 Builder to generate elasticity docs.
 
 The build proceeds in the below steps:
-1. Use materials builder to group tasks according the formula, space group,
-   structure matching
-2. Filter opt and deform tasks using calc type
-3. Filter opt and deform tasks to match prescribed INCAR params
-4. Group opt and deform tasks by parent lattice, i.e. lattice before deformation
-5. For each group, select the one with the latest completed time (all tasks in a
-   group are regarded as the same after going through all the filters)
-6. For all opt-deform tasks groups with the same parent lattice, select the group with
-   the most number of deformation tasks as the final data fot fitting the elastic tensor
-7. Fit the elastic tensor
+1. Use materials builder to group tasks according the formula, space group, and
+   structure matching.
+2. Filter opt and deform tasks by calc type.
+3. Filter opt and deform tasks to match prescribed INCAR params.
+4. Group opt tasks by optimized lattice, and, for each group, select the latest task
+   (the one with the newest completing time). This result in a {lat, opt_task} dict.
+5. Group deform tasks by parent lattice (i.e. lattice before a deformation gradient is
+   applied). For each lattice group, then group the tasks by deformation gradient,
+   and select the latest task for each deformation gradient. This result in a {lat,
+   [deform_task]} dict, where [deform_task] are tasks with unique deformation gradients.
+6. Associate opt and deform tasks by matching parent lattice. Then select the one with
+   the most deformation tasks as the final data for fitting the elastic tensor.
+7. Fit the elastic tensor.
 """
 
 from datetime import datetime
@@ -109,7 +112,6 @@ def get_items(
                 "output",
                 "orig_inputs",
                 "completed_at",
-                "last_updated",
                 "transmuter",
                 "task_id",
                 "dir_name",
@@ -170,7 +172,7 @@ def process_item(
         # tasks with the same deformation
         deform_grouped = group_by_parent_lattice(deform_tasks, mode="deform")
         deform_grouped = [
-            (lattice, filter_deform_tasks_by_time(tasks))
+            (lattice, filter_deform_tasks_by_time(tasks, logger=self.logger))
             for lattice, tasks in deform_grouped
         ]
 
@@ -186,18 +188,17 @@ def process_item(
         stresses = []
         deform_task_ids = []
         deform_dir_names = []
-        deform_last_updated = []
         for doc in final_deform:
             deforms.append(
                 Deformation(
                     doc["transmuter"]["transformation_params"][0]["deformation"]
                 )
             )
-            # -0.1 to convert to GPa from kBar and s
+            # 0.1 to convert to GPa from kBar, and the minus sign to flip the stress
+            # direction from compressive as positive (in vasp) to tensile as positive
             stresses.append(-0.1 * Stress(doc["output"]["stress"]))
             deform_task_ids.append(doc["task_id"])
             deform_dir_names.append(doc["dir_name"])
-            deform_last_updated.append(doc["last_updated"])
 
         elasticity_doc = ElasticityDoc.from_deformations_and_stresses(
             structure=Structure.from_dict(final_opt["output"]["structure"]),
@@ -206,7 +207,6 @@ def process_item(
             stresses=stresses,
             deformation_task_ids=deform_task_ids,
             deformation_dir_names=deform_dir_names,
-            deform_last_updated=deform_last_updated,
             equilibrium_stress=-0.1 * Stress(final_opt["output"]["stress"]),
             optimization_task_id=final_opt["task_id"],
             optimization_dir_name=final_opt["dir_name"],
@@ -316,33 +316,17 @@ def filter_opt_tasks_by_time(tasks: List[Dict], logger) -> Dict:
     Filter a set of tasks to select the latest completed one.
 
     Args:
-        tasks: the set of tasks to filer
+        tasks: the set of tasks to filter
         logger:
 
     Returns:
         selected latest task
     """
-    if len(tasks) == 0:
-        raise RuntimeError("Cannot select latest from 0 tasks")
-    elif len(tasks) == 1:
-        return tasks[0]
-    else:
-        completed = [(datetime.fromisoformat(t["completed_at"]), t) for t in tasks]
-        sorted_by_completed = sorted(completed, key=lambda pair: pair[0])
-        latest_pair = sorted_by_completed[-1]
-        selected = latest_pair[1]
-
-        task_ids = [t["task_id"] for t in tasks]
-        logger.warning(
-            f"Select the latest optimization task {selected['task_id']} completed at "
-            f"{selected['completed_at']} from a set of tasks: {task_ids}."
-        )
-
-        return selected
+    return _filter_tasks_by_time(tasks, "optimization", logger)
 
 
 def filter_deform_tasks_by_time(
-    tasks: List[Dict], deform_comp_tol: float = 1e-5
+    tasks: List[Dict], deform_comp_tol: float = 1e-5, logger=None
 ) -> List[Dict]:
     """
     For deformation tasks with the same deformation, select the latest completed one.
@@ -355,21 +339,46 @@ def filter_deform_tasks_by_time(
         filtered deformation tasks
     """
 
-    mapping = TensorMapping(tol=deform_comp_tol, tensors=[], values=[])
+    mapping = TensorMapping(tol=deform_comp_tol)
 
+    # group tasks by deformation
     for doc in tasks:
         # assume only one deformation, should be checked in `filter_deform_tasks()`
         deform = doc["transmuter"]["transformation_params"][0]["deformation"]
 
         if deform in mapping:
-            current = datetime.fromisoformat(doc["completed_at"])
-            exist = datetime.fromisoformat(mapping[deform]["completed_at"])
-            if current > exist:
-                mapping[deform] = doc
+            mapping[deform].append(doc)
         else:
-            mapping[deform] = doc
+            mapping[deform] = [doc]
+
+    # select the latest task for each deformation
+    selected = []
+    for docs in mapping.values():
+        t = _filter_tasks_by_time(docs, "deformation", logger)
+        selected.append(t)
+
+    return selected
+
+
+def _filter_tasks_by_time(tasks: List[Dict], mode: str, logger) -> Dict:
+    """
+    Helper function to filter a set of tasks to select the latest completed one.
+    """
+    if len(tasks) == 0:
+        raise RuntimeError(f"Cannot filter {mode} task from 0 input tasks")
+    elif len(tasks) == 1:
+        return tasks[0]
+
+    completed = [(datetime.fromisoformat(t["completed_at"]), t) for t in tasks]
+    sorted_by_completed = sorted(completed, key=lambda pair: pair[0])
+    latest_pair = sorted_by_completed[-1]
+    selected = latest_pair[1]
 
-    selected = list(mapping.values())
+    task_ids = [t["task_id"] for t in tasks]
+    logger.info(
+        f"Found multiple {mode} tasks {task_ids}; selected the latest task "
+        f"{selected['task_id']} that is completed at {selected['completed_at']}."
+    )
 
     return selected
 
@@ -392,9 +401,9 @@ def select_final_opt_deform_tasks(
     """
 
     # group opt and deform tasks by lattice
-    mapping = TensorMapping(tol=lattice_comp_tol, tensors=[], values=[])
-    for lat, ot in opt_tasks:
-        mapping[lat] = {"opt_task": ot}
+    mapping = TensorMapping(tol=lattice_comp_tol)
+    for lat, opt_t in opt_tasks:
+        mapping[lat] = {"opt_task": opt_t}
 
     for lat, dt in deform_tasks:
         if lat in mapping:
diff --git a/emmet-builders/emmet/builders/vasp/materials.py b/emmet-builders/emmet/builders/vasp/materials.py
@@ -71,7 +71,7 @@ def __init__(
 
     def ensure_indexes(self):
         """
-        Ensures indicies on the tasks and materials collections
+        Ensures indices on the tasks and materials collections
         """
 
         # Basic search index for tasks
@@ -224,7 +224,7 @@ def process_item(self, items: List[Dict]) -> List[Dict]:
 
         Returns:
             ([dict],list): a list of new materials docs and a list of task_ids that
-                were processsed
+                were processed
         """
 
         tasks = [TaskDocument(**task) for task in items]
@@ -330,7 +330,7 @@ def filter_and_group_tasks(
             symprec=self.settings.SYMPREC,
         )
         for group in grouped_structures:
-            grouped_tasks = [filtered_tasks[struc.index] for struc in group]  # type: ignore
+            grouped_tasks = [filtered_tasks[struct.index] for struct in group]  # type: ignore
             yield grouped_tasks
 
 
diff --git a/emmet-core/emmet/core/elasticity.py b/emmet-core/emmet/core/elasticity.py
@@ -244,7 +244,7 @@ def from_deformations_and_stresses(
         p_stresses = stresses
         (
             p_strains,
-            p_pk_stresses,
+            p_2nd_pk_stresses,
             p_task_ids,
             p_dir_names,
         ) = generate_primary_fitting_data(
@@ -256,14 +256,22 @@ def from_deformations_and_stresses(
             d_deforms,
             d_strains,
             d_stresses,
-            d_pk_stresses,
+            d_2nd_pk_stresses,
         ) = generate_derived_fitting_data(structure, p_strains, p_stresses)
 
+        fitting_strains = p_strains + d_strains
+        fitting_stresses = p_2nd_pk_stresses + d_2nd_pk_stresses
+
+        # avoid symmop-related strains having non-symmop-related stresses
+        fitting_stresses = symmetrize_stresses(
+            fitting_stresses, fitting_strains, structure
+        )
+
         # fitting elastic tensor
         try:
             elastic_tensor = fit_elastic_tensor(
-                p_strains + d_strains,
-                p_pk_stresses + d_pk_stresses,
+                fitting_strains,
+                fitting_stresses,
                 eq_stress=equilibrium_stress,
                 fitting_method=fitting_method,
             )
@@ -300,9 +308,8 @@ def from_deformations_and_stresses(
                 derived_props = get_derived_properties(structure, elastic_tensor)
 
                 # check all
-                all_strains = p_strains + d_strains
                 state, warnings = sanity_check(
-                    structure, et_doc, all_strains, derived_props  # type: ignore
+                    structure, et_doc, fitting_strains, derived_props  # type: ignore
                 )
 
             except np.linalg.LinAlgError as e:
@@ -322,7 +329,7 @@ def from_deformations_and_stresses(
             deformations=[x.tolist() for x in p_deforms],  # type: ignore
             strains=[x.tolist() for x in p_strains],  # type: ignore
             cauchy_stresses=[x.tolist() for x in p_stresses],  # type: ignore
-            second_pk_stresses=[x.tolist() for x in p_pk_stresses],  # type: ignore
+            second_pk_stresses=[x.tolist() for x in p_2nd_pk_stresses],  # type: ignore
             deformation_tasks=p_task_ids,  # type: ignore
             deformation_dir_names=p_dir_names,  # type: ignore
             equilibrium_cauchy_stress=eq_stress,
@@ -419,7 +426,7 @@ def generate_derived_fitting_data(
         derived_deforms: derived deformations
         derived_strains: derived strains
         derived_stresses: derived Cauchy stresses
-        derived_pk_stresses: derived second Piola-Kirchhoff stresses
+        derived_2nd_pk_stresses: derived second Piola-Kirchhoff stresses
     """
 
     sga = SpacegroupAnalyzer(structure, symprec=symprec)
@@ -436,7 +443,7 @@ def generate_derived_fitting_data(
     # asymmetry of the deformation gradient.
 
     # generated derived deforms
-    mapping = TensorMapping(tol=tol, tensors=[], values=[])
+    mapping = TensorMapping(tol=tol)
     for i, p_strain in enumerate(strains):
         for op in symmops:
             d_strain = p_strain.transform(op)
@@ -465,7 +472,7 @@ def generate_derived_fitting_data(
     derived_strains = []
     derived_stresses = []
     derived_deforms = []
-    derived_pk_stresses = []
+    derived_2nd_pk_stresses = []
 
     for d_strain, op_set in mapping.items():
         symmops, p_indices = zip(*op_set)
@@ -479,9 +486,48 @@ def generate_derived_fitting_data(
 
         deform = d_strain.get_deformation_matrix()
         derived_deforms.append(deform)
-        derived_pk_stresses.append(d_stress.piola_kirchoff_2(deform))
+        derived_2nd_pk_stresses.append(d_stress.piola_kirchoff_2(deform))
+
+    return derived_deforms, derived_strains, derived_stresses, derived_2nd_pk_stresses
+
+
+def symmetrize_stresses(
+    stresses: List[Stress],
+    strains: List[Strain],
+    structure: Structure,
+    symprec=SETTINGS.SYMPREC,
+    tol: float = 0.002,
+) -> List[Stress]:
+    """
+    Symmetrize stresses by averaging over all symmetry operations.
+
+    Args:
+        stresses: stresses to be symmetrized
+        strains: strains corresponding to the stresses
+        structure: materials structure
+        symprec: symmetry operation precision
+        tol: tolerance for comparing strains and also for determining whether the
+            deformation corresponds to the train is independent. The elastic workflow
+            use a minimum strain of 0.005, so the default tolerance of 0.002 should be
+            able to distinguish different strain states.
+
+    Returns: symmetrized stresses
+    """
+    sga = SpacegroupAnalyzer(structure, symprec=symprec)
+    symmops = sga.get_symmetry_operations(cartesian=True)
 
-    return derived_deforms, derived_strains, derived_stresses, derived_pk_stresses
+    # for each strain, get the stresses from other strain states related by symmetry
+    symmmetrized_stresses = []  # type: List[Stress]
+    for strain, stress in zip(strains, stresses):
+        mapping = TensorMapping([strain], [[]], tol=tol)
+        for strain2, stress2 in zip(strains, stresses):
+            for op in symmops:
+                if strain2.transform(op) in mapping:
+                    mapping[strain].append(stress2.transform(op))
+        sym_stress = np.average(mapping[strain], axis=0)
+        symmmetrized_stresses.append(Stress(sym_stress))
+
+    return symmmetrized_stresses
 
 
 def fit_elastic_tensor(
diff --git a/emmet-core/emmet/core/vasp/task_valid.py b/emmet-core/emmet/core/vasp/task_valid.py
@@ -98,7 +98,7 @@ class TaskDocument(BaseTaskDocument, StructureMetadata):
     calc_code = "VASP"
     run_stats: Dict[str, RunStatistics] = Field(
         {},
-        description="Summary of runtime statisitics for each calcualtion in this task",
+        description="Summary of runtime statistics for each calculation in this task",
     )
 
     is_valid: bool = Field(
diff --git a/tests/test_files/elasticity/SiC_fitting_data.json b/tests/test_files/elasticity/SiC_fitting_data.json
@@ -0,0 +1 @@
+{"structure": {"@module": "pymatgen.core.structure", "@class": "Structure", "charge": null, "lattice": {"matrix": [[0.0, 2.18908661, 2.18908661], [2.18908661, 0.0, 2.18908661], [2.18908661, 2.18908661, 0.0]], "a": 3.0958359730713423, "b": 3.0958359730713423, "c": 3.0958359730713423, "alpha": 59.99999999999999, "beta": 59.99999999999999, "gamma": 59.99999999999999, "volume": 20.98064470225813}, "sites": [{"species": [{"element": "Si", "occu": 1}], "abc": [0.25, 0.25, 0.25], "xyz": [1.094543305, 1.094543305, 1.094543305], "label": "Si", "properties": {"magmom": -0.0}}, {"species": [{"element": "C", "occu": 1}], "abc": [0.0, 0.0, 0.0], "xyz": [0.0, 0.0, 0.0], "label": "C", "properties": {"magmom": 0.0}}]}, "deformations": [[[1.0, 0.0, 0.0], [0.0, 1.0, -0.01], [0.0, 0.0, 0.9999499987499375]], [[1.0, 0.0, 0.0], [0.0, 1.0, -0.02], [0.0, 0.0, 0.999799979995999]], [[1.0099504938362078, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[1.004987562112089, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[0.99498743710662, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[0.9899494936611666, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]], "stresses": [[[0.003466756, -0.0, -0.0], [-0.0, 0.00175208, -2.4112679420000003], [-0.0, -2.4112679420000003, -0.04647596500000001]], [[-0.0031779110000000003, 0.0, -0.0], [0.0, -0.010187673000000001, -4.822484086], [-0.0, -4.822484086, -0.20312676200000002]], [[3.7955732159999997, 0.0, -0.0], [-0.0, 1.206463684, -0.0], [-0.0, -0.0, 1.206463684]], [[1.9064880160000002, -0.0, 0.0], [-0.0, 0.617022229, -0.0], [0.0, -0.0, 0.617022229]], [[-1.9325755010000003, -0.0, -0.0], [-0.0, -0.6543919500000001, -0.0], [0.0, -0.0, -0.6543919500000001]], [[-3.8838944410000007, -0.0, -0.0], [-0.0, -1.339910387, -0.0], [0.0, -0.0, -1.339910387]]], "equilibrium_stress": [[-0.0047456650000000005, -0.0, -0.0], [-0.0, -0.0047456650000000005, 0.0], [0.0, -0.0, -0.0047456650000000005]]}
diff --git a/tests/test_files/elasticity/SiC_reference_data.json b/tests/test_files/elasticity/SiC_reference_data.json
diff --git a/tests/test_files/elasticity/SiC_tasks.json.gz b/tests/test_files/elasticity/SiC_tasks.json.gz

Original file line number	Diff line number	Diff line change
`@@ -98,7 +98,7 @@ class TaskDocument(BaseTaskDocument, StructureMetadata):`
`98`	`98`	`calc_code = "VASP"`
`99`	`99`	`run_stats: Dict[str, RunStatistics] = Field(`
`100`	`100`	`{},`
`101`		`- description="Summary of runtime statisitics for each calcualtion in this task",`
	`101`	`+ description="Summary of runtime statistics for each calculation in this task",`
`102`	`102`	`)`
`103`	`103`
`104`	`104`	`is_valid: bool = Field(`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+{"structure": {"@module": "pymatgen.core.structure", "@class": "Structure", "charge": null, "lattice": {"matrix": [[0.0, 2.18908661, 2.18908661], [2.18908661, 0.0, 2.18908661], [2.18908661, 2.18908661, 0.0]], "a": 3.0958359730713423, "b": 3.0958359730713423, "c": 3.0958359730713423, "alpha": 59.99999999999999, "beta": 59.99999999999999, "gamma": 59.99999999999999, "volume": 20.98064470225813}, "sites": [{"species": [{"element": "Si", "occu": 1}], "abc": [0.25, 0.25, 0.25], "xyz": [1.094543305, 1.094543305, 1.094543305], "label": "Si", "properties": {"magmom": -0.0}}, {"species": [{"element": "C", "occu": 1}], "abc": [0.0, 0.0, 0.0], "xyz": [0.0, 0.0, 0.0], "label": "C", "properties": {"magmom": 0.0}}]}, "deformations": [[[1.0, 0.0, 0.0], [0.0, 1.0, -0.01], [0.0, 0.0, 0.9999499987499375]], [[1.0, 0.0, 0.0], [0.0, 1.0, -0.02], [0.0, 0.0, 0.999799979995999]], [[1.0099504938362078, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[1.004987562112089, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[0.99498743710662, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[0.9899494936611666, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]], "stresses": [[[0.003466756, -0.0, -0.0], [-0.0, 0.00175208, -2.4112679420000003], [-0.0, -2.4112679420000003, -0.04647596500000001]], [[-0.0031779110000000003, 0.0, -0.0], [0.0, -0.010187673000000001, -4.822484086], [-0.0, -4.822484086, -0.20312676200000002]], [[3.7955732159999997, 0.0, -0.0], [-0.0, 1.206463684, -0.0], [-0.0, -0.0, 1.206463684]], [[1.9064880160000002, -0.0, 0.0], [-0.0, 0.617022229, -0.0], [0.0, -0.0, 0.617022229]], [[-1.9325755010000003, -0.0, -0.0], [-0.0, -0.6543919500000001, -0.0], [0.0, -0.0, -0.6543919500000001]], [[-3.8838944410000007, -0.0, -0.0], [-0.0, -1.339910387, -0.0], [0.0, -0.0, -1.339910387]]], "equilibrium_stress": [[-0.0047456650000000005, -0.0, -0.0], [-0.0, -0.0047456650000000005, 0.0], [0.0, -0.0, -0.0047456650000000005]]}