-
Notifications
You must be signed in to change notification settings - Fork 129
Add RFC for foreach-parallel feature #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
--- | ||
RFC: RFCnnnn | ||
Author: Paul Higinbotham | ||
Status: Draft | ||
SupercededBy: N/A | ||
Version: 1.0 | ||
Area: Engine | ||
Comments Due: July 1, 2019 | ||
Plan to implement: Yes | ||
--- | ||
|
||
# Implement PowerShell language foreach -parallel | ||
|
||
Windows PowerShell currently supports the foreach language keyword with the -parallel switch flag, but only for workflow scripts. | ||
|
||
```powershell | ||
|
||
workflow wf1 { | ||
$list = 1..5 | ||
foreach -parallel -throttlelimit 5 ($item in $list) { | ||
Start-Sleep -Seconds 1 | ||
Write-Output "Output $item" | ||
} | ||
} | ||
|
||
``` | ||
|
||
This will run the script block with each value in the `$list` array, in parallel using workflow jobs. | ||
However, workflow is not supported in PowerShell Core 6, partly because it is a Windows only solution but also because it is cumbersome to use. | ||
In addition the workflow implementation is very heavy weight, using lots of system resources. | ||
|
||
This is a proposal to re-implement `foreach -parallel` in PowerShell Core, using PowerShell's support for concurrency via Runspaces. | ||
It is similar to the [ThreadJob module](https://www.powershellgallery.com/packages/ThreadJob/1.1.2) except that it becomes part of the PowerShell language via `foreach -parallel`. | ||
|
||
## Motivation | ||
|
||
As a PowerShell User, | ||
PaulHigin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
I can do simple fan-out concurrency from within the language, without having to obtain and load a separate module or deal with PowerShell jobs. | ||
|
||
## Specification | ||
|
||
The PowerShell `foreach -parallel` language keyword will be re-implemented to perform invoke script blocks in parallel, similar to how it works for workflow functions except that script blocks will be invoked on threads within the same process rather than in workflow jobs running in separate processes. | ||
The default behavior is to fan-out script block execution to multiple threads, and then wait for all threads to finish. | ||
However, a `-asjob` switch will also be supported that returns a PowerShell job object for asynchronous use. | ||
If the number of foreach iterations exceed the throttle limit value, then only the throttle limit number of threads are created at a time and the rest are queued until a running thread becomes available. | ||
PaulHigin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Supported foreach parameters | ||
|
||
- `-parallel` | ||
- `-throttlelimit` | ||
PaulHigin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- `-timeout` | ||
- `-asjob` | ||
|
||
### P0 Features | ||
|
||
- `foreach -parallel` fans out script block execution to threads, along with a bound single foreach iteration value | ||
|
||
- `-throttlelimit` parameter value specifies the maximum number of threads that can run at one time | ||
|
||
- `-timeout` parameter value specifies a maximum time to wait for all iterations to complete, after which 'stop' will be called on all running script blocks to terminate execution | ||
|
||
- `-asjob` switch causes foreach to return a PowerShell job object that is used to asynchronously monitor execution | ||
|
||
- When a job object is returned, it will be compatible with all relevant job cmdlets | ||
PaulHigin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- All script blocks running in parallel will run isolated from each other. | ||
PaulHigin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Only foreach iteration objects will be passed to the parallel script block. | ||
|
||
### Data stream handling | ||
|
||
`foreach -parallel` will use normal PowerShell pipes to return various data streams. | ||
Data will be returned in order received. | ||
Except when `-asjob` switch is used, in which case a single job object is returned. | ||
The returned job object will contain an array of child jobs that represent each iteration of the foreach. | ||
PaulHigin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Examples | ||
|
||
```powershell | ||
PaulHigin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
$computerNames = 'computer1','computer2','computer3','computer4','computer5' | ||
$logs = foreach -parallel -throttle 10 -timeout 300 ($computer in $computerNames) | ||
{ | ||
Get-Logs -ComputerName $computer | ||
} | ||
``` | ||
|
||
```powershell | ||
$computerNames = 'computer1','computer2','computer3','computer4','computer5' | ||
$job = foreach -parallel -asjob ($computer in $computerNames) | ||
{ | ||
Get-Logs -ComputerName $computer | ||
} | ||
$logs = $job | Wait-Job | Receive-Job | ||
``` | ||
|
||
```powershell | ||
$params += @{ | ||
$argTitle = "Title1" | ||
$argValue = 102 | ||
} | ||
foreach -parallel ($param in $params) | ||
{ | ||
c:\scripts\ToRun.ps1 @param | ||
} | ||
``` | ||
|
||
## Alternate Proposals and Considerations | ||
|
||
One alternative is to create a `ForEach-Parallel` cmdlet instead of re-implementing the `foreach -parallel` keyword. | ||
This would work well but would not be as useful as making it part of the PowerShell language. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is maybe worth discussing more, just because the drawbacks aren't as apparent to me. What's the key limitation here? The drawbacks I see with
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe that I also don't believe that we would be able to splat to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe not, but I agree with @rjmholt here; it makes more syntactic sense to put this into the cmdlet instead. Additionally, having this available for pipeline cmdlets I think would be significantly more valuable than just a foreach loop, which can't be used in a pipeline context. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm agree with @vexx32 and I think it's a bad practice to modify foreach in this way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am perfectly fine implementing this as a cmdlet rather than a foreach language keyword extension. In fact it is much easier to implement as a cmdlet. If the community prefers a cmdlet (as it seems from these comments) then I am happy to update this RFC accordingly. But I'll let the PowerShell committee weigh in as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put like that :-) ... there is a case for both, I think the case is stronger for the cmdlet;
I was saying, "Yes and that's why the keyword is less good" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Some scripts and modules need to run on versions from PS 7 down to PS 3 (or even PS 2 in the case of Pester), and not just on Windows. An example is the PowerShellEditorServices (backend to the PowerShell extension for VSCode) startup script; it must run on everything from PS 3 in Windows Server 2012 (in some cases 2008) to PS 7 on Ubuntu 18.04. We took it out, but for example that script used to call Keywords that don't exist in any of those versions can't be used anywhere in that script. We'd have to write a whole new script (slowing down startup, increasing the download size, duplicating the code). Whereas a command parameter can be added to a splat conditionally. PS 7 users would get the parallel speedup, but it still works in PS 3. Another example is the Install-VSCode.ps1 script. It wants to be fast, so in Windows PowerShell it uses You can already prevent downlevel running at parse time with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That startup script is one fugly piece of scripting. :-) But I think you miss @jhoneill's point - a down-level script engine (e.g., PS v3) should generate an error if not gated by a PSVersion check. "foreach -parallel" does not pass execution time checking on PS v3, even though a proper AST is generated - up until the "-parallel" parameter. But if the code is protected by a PSVersion check, all is OK. I wouldn't think that should change - and I don't think that @jhoneill is suggesting otherwise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was missing that PS v3 was handling this differently. When I checked it on PS v5.1, I got this message when calling
And that is exactly what I expected to happen on 5.1. Because it was a parse exception, it never even tried to execute. This is also what I expected to happen. I expect that I would need to use it in a workflow on a pre PS v7 script. I am perfectly OK with it as a PS v7 feature that is not compatible with anything lower. If you are targeting a lower version, then you need to not use the newer features. With that said, I ok with script authors abusing the syntax to write multi-versioned scripts but I don't think that should dictate the primary design. If Yes, I know some people need to write scripts that target PS v3 and it would be really nice to have a script that just ran faster on PS 7.0 and still worked on PS v3, but it is also perfectly OK for it to be a parse exception in PS v3. We already have things in PS v7 that are a parse exception in PS v5.1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Up to a point ... Having dealt with clients where it is just too painful to get servers updated from PS4 to PS5 and found my scripts used a couple of bits of 5 specific syntax, I'm probably more keen than average that things should work on old versions. This errors without running anything on 5.1
This runs
Now, if everything else were equal (and it's not the cmdlet can go in a pipeline) I think most people would say the implementation which supports one script for two versions is preferable - it's not mandatory But here's why breaking can be good. Imagine a college creating a ton of new users at the start of a year. So we do this and it all looks good but someone says "It makes the new csv real quick but creating the home directories feels like a month" so they add -parallel. Then someone runs the script on another box and the users get created, the file is exported and BANG error with no homedirectories set up. We can't run the script again because the users exist so we have to clean up and it's all horrible. |
||
But if re-implementing the foreach keyword becomes problematic, it would be a good fallback solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of how implemented (as a cmdlet or a foreach extension), I seriously hope you will make this compatible with psexec. When I filed an issue against threadjob noting that it broke psexec, psexec was blamed. I doubt highly that sysinternals/Mark Russinovich would agree with that.