You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: manual operations.md
+23
Original file line number
Diff line number
Diff line change
@@ -78,6 +78,27 @@ Neutrino and MinBias are the major categories, and MinBias is particularly large
78
78
79
79
To check if a workflow has secondary inputs, look for `minbias` in the error report, or check `MCPileup` on ReqMgr.
80
80
81
+
# workflow composition
82
+
workflows are made of **tasks**.
83
+
Each task represents one step in the workflow production chain (e.g. GEN, SIM, DIGI...)
84
+
85
+
Tasks are divided into **jobs** to be run on a batching system (e.g. condor).
86
+
87
+
# Computing Resources
88
+
Each workflow can only take up a limited amount of computing resource before hitting PerformanceKill walls (exit code `50664` and `50660`).
89
+
90
+
To better understand how computing resource limitations on workflows,
91
+
one first needs to understand the [[manual operations#workflow composition|components of a workflow]].
92
+
93
+
The limitations are put on condor "slots".
94
+
Each condor slot hosts one job from a task of the workflow.
95
+
Jobs are killed if they exceed designated **CPU time** or **memory usage** for the condor slot.
96
+
To overcome these limitations, one could:
97
+
- split the jobs into smaller sizes so it takes less resources for them to run through
98
+
- n.b. you can check the splitting algorithm used by tasks by looking at `SplittingAlgo` from the ReqMgr/JSON page of the workflow. If it is `EventAwareLumiBased`, then splitting will work; if it is `EventBased`, then splitting won't work.
99
+
- increase the number of CPU cores and the amount of memory per core.
100
+
- n.b. changing nCore does not currently work in console (2022-04-05).
101
+
81
102
# Unified status
82
103
## agentfilemismatch
83
104
- a grace period of 2 days before it moves to filemismatch
@@ -96,3 +117,5 @@ To check if a workflow has secondary inputs, look for `minbias` in the error rep
96
117
# Standard procedure for errors
97
118
## `8021-FileReadError` and `8028-FallbackFileOpenError`
98
119
In general, you should check dbs to track down the root cause. It might just be opportunistic, so ACDC without excluding the error site might already work.
0 commit comments