|
| 1 | +--- |
| 2 | +RFC: RFC0044 |
| 3 | +Author: Paul Higinbotham |
| 4 | +Status: Experimental-Accepted |
| 5 | +SupercededBy: N/A |
| 6 | +Version: 1.0 |
| 7 | +Area: Engine |
| 8 | +Comments Due: July 18, 2019 |
| 9 | +Plan to implement: Yes |
| 10 | +--- |
| 11 | + |
| 12 | +# PowerShell ForEach-Object -Parallel Cmdlet |
| 13 | + |
| 14 | +This RFC proposes a new parameter set for the existing ForEach-Object cmdlet to parallelize script block executions, instead of running them sequentially as it does now. |
| 15 | + |
| 16 | +## Motivation |
| 17 | + |
| 18 | + As a PowerShell User, |
| 19 | + I can execute foreach-object piped input in script blocks running in parallel threads, either synchronously or asynchronously, while limiting the number of threads running at a given time. |
| 20 | + |
| 21 | +## Specification |
| 22 | + |
| 23 | +A new `-Parallel` parameter set will be added to the existing ForEach-Object cmdlet that supports running piped input concurrently in a provided script block. |
| 24 | + |
| 25 | +- `-Parallel` parameter takes a script block that is executed in parallel for each piped input variable |
| 26 | + |
| 27 | +- `-ThrottleLimit` parameter takes an integer value that determines the maximum number of script blocks running at the same time |
| 28 | + |
| 29 | +- `-TimeoutSeconds` parameter takes an integer that specifies the maximum time to wait for completion before the command is aborted |
| 30 | + |
| 31 | +- `-AsJob` parameter switch indicates that a job is returned, which represents the command running asynchronously |
| 32 | + |
| 33 | +The `ForEach-Object -Parallel` command will stream output to the console until all piped input has been processed. |
| 34 | +If the `-AsJob` switch is used then a job object is returned and remains in the running state while input is being processed. |
| 35 | +The returned job object can be used with all PowerShell cmdlets that manipulate jobs. |
| 36 | + |
| 37 | +### Implementation details |
| 38 | + |
| 39 | +Implementation will be similar to the ThreadJob module in that thread script block execution will be contained within a PSThreadChildJob object. |
| 40 | +The jobs will be run concurrently on separate runspaces/threads up to the ThrottleLimit value, and the remainder queued to wait for an available runspace/thread to run on. |
| 41 | +Initial implementation will not attempt to reuse threads and runspaces when running queued items, due to concerns of stale state breaking script execution. |
| 42 | +For example, PowerShell uses thread local storage to store per thread default runspaces. |
| 43 | +And even though there is a runspace 'ResetRunspaceState' API method, it only resets session variables and debug/transaction managers. |
| 44 | +Imported modules and function definitions are not affected. |
| 45 | +A script that defines a constant function would fail if the function is already defined. |
| 46 | +The initial assumption will be that runspace/thread creation time is insignificant compared to the time needed to execute the script block, either because of high compute needs or because of long wait times for results. |
| 47 | +If this assumption is not true then the user should consider batching the work load to each foreach-object iteration, or simply use the sequential/non-parallel form of the cmdlet. |
| 48 | + |
| 49 | +The 'TimeoutSeconds' parameter will attempt to halt all script block executions after the timeout time has passed, however it may not be immediately successful if the running script is calling a native command or API, in which case it needs for the call to return before it can halt the running script. |
| 50 | + |
| 51 | +### Variable passing |
| 52 | + |
| 53 | +ForEach-Object -Parallel will support the PowerShell `$_` current piped item variable within each script block. |
| 54 | +It will also support the `$using:` directive for passing variables from script scope into the parallel executed script block scope. |
| 55 | +If the passed in variable is a value type, a copy of the value is passed to the script block. |
| 56 | +If the passed in variable is a reference type, the reference is passed and each running script block can modify it. |
| 57 | +Since the script blocks are running in different threads, modifying a reference type that is not thread safe will result in undefined behavior. |
| 58 | + |
| 59 | +ScriptBlock variables are a special case because they have runspace affinity, and cannot be safely passed to other runspace script blocks for parallel execution. |
| 60 | +Consequently, an error will be generated if a ScriptBlock object is directly passed through the input pipeline, or if passed to the parallel script block via the `$using:` directive. |
| 61 | +However, it is still possible to pass in a ScriptBlock object indirectly such as through an object method returning a ScriptBlock. |
| 62 | +This is not recommended and will result in undefined behavior. |
| 63 | + |
| 64 | +### Exceptions |
| 65 | + |
| 66 | +For critical exceptions, such as out of memory or stack overflow, the CLR will crash the process. |
| 67 | +Since all parallel running script blocks run in different threads in the same process, all running script blocks will terminate, and queued script blocks will never run. |
| 68 | +This is different from PowerShell jobs (Start-Job) where each job script runs in a separate child process, and therefore has better isolation to crashes. |
| 69 | +The lack of process isolation is one of the costs of better performance while using threads for parallelization. |
| 70 | + |
| 71 | +For all other catchable exceptions, PowerShell will catch them from each thread and write them as non-terminating error records to the error data stream. |
| 72 | +If the `ErrorAction` parameter is set to 'Stop' then cmdlet will attempt to stop the parallel execution on any error. |
| 73 | + |
| 74 | +### Stop behavior |
| 75 | + |
| 76 | +Whenever a timeout, a terminating error (-ErrorAction Stop), or a stop command (Ctrl+C) occurs, a stop signal will be sent to all running script blocks, and any queued script block iterations will be dequeued. |
| 77 | +This does not guarantee that a running script will stop immediately, if that script is running a native command or making an API call. |
| 78 | +So it is possible for a stop command to be ineffective if one running thread is busy or hung. |
| 79 | + |
| 80 | +We can consider including some kind of 'forcetimeout' parameter that would kill any threads that did not end in a specified time. |
| 81 | + |
| 82 | +If a job object is returned (-AsJob) the child jobs that were dequeued by the stop command will be at 'NotStarted' state. |
| 83 | + |
| 84 | +### Data streams |
| 85 | + |
| 86 | +Warning, Error, Debug, Verbose data streams will be written to the cmdlet data streams as received from each running parallel script block. |
| 87 | +Progress data streams will not be supported, but can be added later if desired. |
| 88 | + |
| 89 | +### Supported scenarios |
| 90 | + |
| 91 | +```powershell |
| 92 | +# Ensure needed module is installed on local system |
| 93 | +if (! (Get-Module -Name MyLogsModule -ListAvailable)) { |
| 94 | + Install-Module -Name MyLogsModule -Force |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +```powershell |
| 99 | +$computerNames = 'computer1','computer2','computer3','computer4','computer5' |
| 100 | +$logs = $computerNames | ForEach-Object -ThrottleLimit 10 -TimeoutSeconds 1800 -Parallel { |
| 101 | + Get-Logs -ComputerName $_ |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +```powershell |
| 106 | +$computerNames = 'computer1','computer2','computer3','computer4','computer5' |
| 107 | +$job = ForEach-Object -ThrottleLimit 10 -InputObject $computerNames -TimeoutSeconds 1800 -AsJob -Parallel { |
| 108 | + Get-Logs -ComputerName $_ |
| 109 | +} |
| 110 | +$logs = $job | Wait-Job | Receive-Job |
| 111 | +``` |
| 112 | + |
| 113 | +```powershell |
| 114 | +$computerNames = 'computer1','computer2','computer3','computer4','computer5' |
| 115 | +$logNames = 'System','SQL' |
| 116 | +$logs = ForEach-Object -InputObject $computerNames -Parallel { |
| 117 | + Get-Logs -ComputerName $_ -LogNames $using:logNames |
| 118 | +} |
| 119 | +``` |
| 120 | + |
| 121 | +```powershell |
| 122 | +$computerNames = 'computer1','computer2','computer3','computer4','computer5' |
| 123 | +$logNames = 'System','SQL','AD','IIS' |
| 124 | +$logResults = ForEach-Object -InputObject $computerNames -Parallel { |
| 125 | + Get-Logs -ComputerName $_ -LogNames $using:logNames |
| 126 | +} | ForEach-Object -Parallel -ScriptBlock { |
| 127 | + Process-Log $_ |
| 128 | +} |
| 129 | +``` |
| 130 | + |
| 131 | +```powershell |
| 132 | +$threadSafeDictionary = [System.Collections.Concurrent.ConcurrentDictionary[string,object]]::new() |
| 133 | +Get-Process | ForEach-Object -Parallel { |
| 134 | + # This works because the passed in object is a concurrent dictionary that is thread safe |
| 135 | + $dict = $using:threadSafeDictionary |
| 136 | + $dict.TryAdd($_.ProcessName, $_) |
| 137 | +} |
| 138 | +``` |
| 139 | + |
| 140 | +### Unsupported scenarios |
| 141 | + |
| 142 | +```powershell |
| 143 | +# Variables must be passed in via $using: keyword |
| 144 | +$LogNameToUse = "IISLogs" |
| 145 | +$computers | ForEach-Object -Parallel { |
| 146 | + # This will fail because $LogNameToUse has not been defined in this scope |
| 147 | + Get-Log -ComputerName $_ -LogName $LogNameToUse |
| 148 | +} |
| 149 | +``` |
| 150 | + |
| 151 | +```powershell |
| 152 | +# Passed in reference variables should not be assigned to |
| 153 | +$MyLogs = @() |
| 154 | +$computers | ForEach-Object -Parallel { |
| 155 | + # Throws error, cannot assign to using variable |
| 156 | + $using:MyLogs += Get-Logs -ComputerName $_ |
| 157 | +} |
| 158 | +At line:3 char:5 |
| 159 | ++ $using:MyLogs += Get-Logs -ComputerName $_ |
| 160 | ++ ~~~~~~~~~~~~~ |
| 161 | +The assignment expression is not valid. The input to an assignment operator must be an object that is able to accept assignments, such as a variable or a property. |
| 162 | ++ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException |
| 163 | ++ FullyQualifiedErrorId : InvalidLeftHandSide |
| 164 | +
|
| 165 | +$dict = [System.Collections.Generic.Dictionary[string,object]]::New() |
| 166 | +$computers | ForEach-Object -Parallel { |
| 167 | + $dict = $using:dict |
| 168 | + $logs = Get-Logs -ComputerName $_ |
| 169 | + # Not thread safe, undefined behavior |
| 170 | + $dict.Add($_, $logs) |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +```powershell |
| 175 | +# Value types not passed by reference |
| 176 | +$count = 0 |
| 177 | +$computers | ForEach-Object -Parallel { |
| 178 | + # Can't assign to using variable |
| 179 | + $using:count += 1 |
| 180 | + $logs = Get-Logs -ComputerName $_ |
| 181 | + return @{ |
| 182 | + ComputerName = $_ |
| 183 | + Count = $count |
| 184 | + Logs = $logs |
| 185 | + } |
| 186 | +} |
| 187 | +``` |
| 188 | + |
| 189 | +## Alternate Proposals and Considerations |
| 190 | + |
| 191 | +Another option (and a previous RFC proposal) is to resurrect the PowerShell Windows workflow script `foreach -parallel` keyword to be used in normal PowerShell script to perform parallel execution of foreach loop iterations. |
| 192 | +However, the majority of the community felt it would be more useful to update the existing ForeEach-Object cmdlet with a -parallel parameter set. |
| 193 | +We may want to eventually implement both solutions. |
| 194 | + |
| 195 | +There are currently other proposals to create a more general framework to support running arbitrary scripts and cmdlets in parallel, by marking them as able to support parallelism (see RFC #206). |
| 196 | +That is outside the scope of this RFC, which focuses on extending just the ForEach-Object cmdlet to support parallel execution, and is intended to allow users to do parallel script/command execution without having to resort to PowerShell APIs. |
0 commit comments