Skip to content

Commit 5ee8f18

Browse files
PaulHiginjoeyaiello
authored andcommitted
Accepted RFC0044 for ForEach-Object -Parallel proposal (#194)
* Submit draft of RFC for ForEach-Object -Parallel proposal * Updated to reflect feed back and explain narrow focus * Fixed two errors * Added more implementation details for clarity * Updated to reflect new parameter set * Update to clarify and reflect current implementation * Fix examples to be correct. * Accept RFC0044 on ForEach-Object -Parallel
1 parent 6419a55 commit 5ee8f18

File tree

1 file changed

+196
-0
lines changed

1 file changed

+196
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
RFC: RFC0044
3+
Author: Paul Higinbotham
4+
Status: Experimental-Accepted
5+
SupercededBy: N/A
6+
Version: 1.0
7+
Area: Engine
8+
Comments Due: July 18, 2019
9+
Plan to implement: Yes
10+
---
11+
12+
# PowerShell ForEach-Object -Parallel Cmdlet
13+
14+
This RFC proposes a new parameter set for the existing ForEach-Object cmdlet to parallelize script block executions, instead of running them sequentially as it does now.
15+
16+
## Motivation
17+
18+
As a PowerShell User,
19+
I can execute foreach-object piped input in script blocks running in parallel threads, either synchronously or asynchronously, while limiting the number of threads running at a given time.
20+
21+
## Specification
22+
23+
A new `-Parallel` parameter set will be added to the existing ForEach-Object cmdlet that supports running piped input concurrently in a provided script block.
24+
25+
- `-Parallel` parameter takes a script block that is executed in parallel for each piped input variable
26+
27+
- `-ThrottleLimit` parameter takes an integer value that determines the maximum number of script blocks running at the same time
28+
29+
- `-TimeoutSeconds` parameter takes an integer that specifies the maximum time to wait for completion before the command is aborted
30+
31+
- `-AsJob` parameter switch indicates that a job is returned, which represents the command running asynchronously
32+
33+
The `ForEach-Object -Parallel` command will stream output to the console until all piped input has been processed.
34+
If the `-AsJob` switch is used then a job object is returned and remains in the running state while input is being processed.
35+
The returned job object can be used with all PowerShell cmdlets that manipulate jobs.
36+
37+
### Implementation details
38+
39+
Implementation will be similar to the ThreadJob module in that thread script block execution will be contained within a PSThreadChildJob object.
40+
The jobs will be run concurrently on separate runspaces/threads up to the ThrottleLimit value, and the remainder queued to wait for an available runspace/thread to run on.
41+
Initial implementation will not attempt to reuse threads and runspaces when running queued items, due to concerns of stale state breaking script execution.
42+
For example, PowerShell uses thread local storage to store per thread default runspaces.
43+
And even though there is a runspace 'ResetRunspaceState' API method, it only resets session variables and debug/transaction managers.
44+
Imported modules and function definitions are not affected.
45+
A script that defines a constant function would fail if the function is already defined.
46+
The initial assumption will be that runspace/thread creation time is insignificant compared to the time needed to execute the script block, either because of high compute needs or because of long wait times for results.
47+
If this assumption is not true then the user should consider batching the work load to each foreach-object iteration, or simply use the sequential/non-parallel form of the cmdlet.
48+
49+
The 'TimeoutSeconds' parameter will attempt to halt all script block executions after the timeout time has passed, however it may not be immediately successful if the running script is calling a native command or API, in which case it needs for the call to return before it can halt the running script.
50+
51+
### Variable passing
52+
53+
ForEach-Object -Parallel will support the PowerShell `$_` current piped item variable within each script block.
54+
It will also support the `$using:` directive for passing variables from script scope into the parallel executed script block scope.
55+
If the passed in variable is a value type, a copy of the value is passed to the script block.
56+
If the passed in variable is a reference type, the reference is passed and each running script block can modify it.
57+
Since the script blocks are running in different threads, modifying a reference type that is not thread safe will result in undefined behavior.
58+
59+
ScriptBlock variables are a special case because they have runspace affinity, and cannot be safely passed to other runspace script blocks for parallel execution.
60+
Consequently, an error will be generated if a ScriptBlock object is directly passed through the input pipeline, or if passed to the parallel script block via the `$using:` directive.
61+
However, it is still possible to pass in a ScriptBlock object indirectly such as through an object method returning a ScriptBlock.
62+
This is not recommended and will result in undefined behavior.
63+
64+
### Exceptions
65+
66+
For critical exceptions, such as out of memory or stack overflow, the CLR will crash the process.
67+
Since all parallel running script blocks run in different threads in the same process, all running script blocks will terminate, and queued script blocks will never run.
68+
This is different from PowerShell jobs (Start-Job) where each job script runs in a separate child process, and therefore has better isolation to crashes.
69+
The lack of process isolation is one of the costs of better performance while using threads for parallelization.
70+
71+
For all other catchable exceptions, PowerShell will catch them from each thread and write them as non-terminating error records to the error data stream.
72+
If the `ErrorAction` parameter is set to 'Stop' then cmdlet will attempt to stop the parallel execution on any error.
73+
74+
### Stop behavior
75+
76+
Whenever a timeout, a terminating error (-ErrorAction Stop), or a stop command (Ctrl+C) occurs, a stop signal will be sent to all running script blocks, and any queued script block iterations will be dequeued.
77+
This does not guarantee that a running script will stop immediately, if that script is running a native command or making an API call.
78+
So it is possible for a stop command to be ineffective if one running thread is busy or hung.
79+
80+
We can consider including some kind of 'forcetimeout' parameter that would kill any threads that did not end in a specified time.
81+
82+
If a job object is returned (-AsJob) the child jobs that were dequeued by the stop command will be at 'NotStarted' state.
83+
84+
### Data streams
85+
86+
Warning, Error, Debug, Verbose data streams will be written to the cmdlet data streams as received from each running parallel script block.
87+
Progress data streams will not be supported, but can be added later if desired.
88+
89+
### Supported scenarios
90+
91+
```powershell
92+
# Ensure needed module is installed on local system
93+
if (! (Get-Module -Name MyLogsModule -ListAvailable)) {
94+
Install-Module -Name MyLogsModule -Force
95+
}
96+
```
97+
98+
```powershell
99+
$computerNames = 'computer1','computer2','computer3','computer4','computer5'
100+
$logs = $computerNames | ForEach-Object -ThrottleLimit 10 -TimeoutSeconds 1800 -Parallel {
101+
Get-Logs -ComputerName $_
102+
}
103+
```
104+
105+
```powershell
106+
$computerNames = 'computer1','computer2','computer3','computer4','computer5'
107+
$job = ForEach-Object -ThrottleLimit 10 -InputObject $computerNames -TimeoutSeconds 1800 -AsJob -Parallel {
108+
Get-Logs -ComputerName $_
109+
}
110+
$logs = $job | Wait-Job | Receive-Job
111+
```
112+
113+
```powershell
114+
$computerNames = 'computer1','computer2','computer3','computer4','computer5'
115+
$logNames = 'System','SQL'
116+
$logs = ForEach-Object -InputObject $computerNames -Parallel {
117+
Get-Logs -ComputerName $_ -LogNames $using:logNames
118+
}
119+
```
120+
121+
```powershell
122+
$computerNames = 'computer1','computer2','computer3','computer4','computer5'
123+
$logNames = 'System','SQL','AD','IIS'
124+
$logResults = ForEach-Object -InputObject $computerNames -Parallel {
125+
Get-Logs -ComputerName $_ -LogNames $using:logNames
126+
} | ForEach-Object -Parallel -ScriptBlock {
127+
Process-Log $_
128+
}
129+
```
130+
131+
```powershell
132+
$threadSafeDictionary = [System.Collections.Concurrent.ConcurrentDictionary[string,object]]::new()
133+
Get-Process | ForEach-Object -Parallel {
134+
# This works because the passed in object is a concurrent dictionary that is thread safe
135+
$dict = $using:threadSafeDictionary
136+
$dict.TryAdd($_.ProcessName, $_)
137+
}
138+
```
139+
140+
### Unsupported scenarios
141+
142+
```powershell
143+
# Variables must be passed in via $using: keyword
144+
$LogNameToUse = "IISLogs"
145+
$computers | ForEach-Object -Parallel {
146+
# This will fail because $LogNameToUse has not been defined in this scope
147+
Get-Log -ComputerName $_ -LogName $LogNameToUse
148+
}
149+
```
150+
151+
```powershell
152+
# Passed in reference variables should not be assigned to
153+
$MyLogs = @()
154+
$computers | ForEach-Object -Parallel {
155+
# Throws error, cannot assign to using variable
156+
$using:MyLogs += Get-Logs -ComputerName $_
157+
}
158+
At line:3 char:5
159+
+ $using:MyLogs += Get-Logs -ComputerName $_
160+
+ ~~~~~~~~~~~~~
161+
The assignment expression is not valid. The input to an assignment operator must be an object that is able to accept assignments, such as a variable or a property.
162+
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
163+
+ FullyQualifiedErrorId : InvalidLeftHandSide
164+
165+
$dict = [System.Collections.Generic.Dictionary[string,object]]::New()
166+
$computers | ForEach-Object -Parallel {
167+
$dict = $using:dict
168+
$logs = Get-Logs -ComputerName $_
169+
# Not thread safe, undefined behavior
170+
$dict.Add($_, $logs)
171+
}
172+
```
173+
174+
```powershell
175+
# Value types not passed by reference
176+
$count = 0
177+
$computers | ForEach-Object -Parallel {
178+
# Can't assign to using variable
179+
$using:count += 1
180+
$logs = Get-Logs -ComputerName $_
181+
return @{
182+
ComputerName = $_
183+
Count = $count
184+
Logs = $logs
185+
}
186+
}
187+
```
188+
189+
## Alternate Proposals and Considerations
190+
191+
Another option (and a previous RFC proposal) is to resurrect the PowerShell Windows workflow script `foreach -parallel` keyword to be used in normal PowerShell script to perform parallel execution of foreach loop iterations.
192+
However, the majority of the community felt it would be more useful to update the existing ForeEach-Object cmdlet with a -parallel parameter set.
193+
We may want to eventually implement both solutions.
194+
195+
There are currently other proposals to create a more general framework to support running arbitrary scripts and cmdlets in parallel, by marking them as able to support parallelism (see RFC #206).
196+
That is outside the scope of this RFC, which focuses on extending just the ForEach-Object cmdlet to support parallel execution, and is intended to allow users to do parallel script/command execution without having to resort to PowerShell APIs.

0 commit comments

Comments
 (0)