Add RFC for foreach-parallel feature #174

PaulHigin · 2019-05-13T23:03:11Z

This is an RFC to implement foreach -parallel language keyword to use PowerShell runspaces/threads for running foreach iterations in parallel.

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md

rjmholt · 2019-05-14T03:28:00Z

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md

+## Alternate Proposals and Considerations
+
+One alternative is to create a `ForEach-Parallel` cmdlet instead of re-implementing the `foreach -parallel` keyword.
+This would work well but would not be as useful as making it part of the PowerShell language.


I think this is maybe worth discussing more, just because the drawbacks aren't as apparent to me. What's the key limitation here?

The drawbacks I see with foreach () -Parallel are:

New syntax means scripts are syntactically backwards incompatible -- they cannot even be successfully parsed by an older PowerShell version. Compare with:
$foreachParams = @{} if ($PSVersionTable.PSVersion.Major -ge 7) { $foreachParams += @{ Parallel = $true } } $workItems | ForEach-Object @foreachParams { Invoke-Processing $_ }

Assigning from a foreach-loop seems like a relatively unintuitive construction and a bit syntactically off. I know it's already a functionality we support and I think it makes sense in the language, but minting it as the primary syntax for parallelism seems to run a bit against the natural style of PowerShell to me

I don't believe that ForEach-Object was in scope of this RFC. If we get this implemented, I can see that becoming part of the conversation.

I also don't believe that we would be able to splat to the foreach operator.

Maybe not, but I agree with @rjmholt here; it makes more syntactic sense to put this into the cmdlet instead. ForEach-Object -Parallel -AsJob {} makes a LOT more sense than foreach -Parallel -AsJob ($a in $b) {} visually, and with only a handful of exceptions the majority of language keywords don't have parameters like that.

Additionally, having this available for pipeline cmdlets I think would be significantly more valuable than just a foreach loop, which can't be used in a pipeline context.

I'm agree with @vexx32 and I think it's a bad practice to modify foreach in this way.
I'm against a "Foreach VS Foreach-Object VS Magic Foreach VS ForEach-Parallel"
I vote for "ForEach-Object -Parallel -AsJob {}"

I am perfectly fine implementing this as a cmdlet rather than a foreach language keyword extension. In fact it is much easier to implement as a cmdlet. If the community prefers a cmdlet (as it seems from these comments) then I am happy to update this RFC accordingly. But I'll let the PowerShell committee weigh in as well.

Put like that :-) ... there is a case for both, I think the case is stronger for the cmdlet;
When you said

To my understanding, the foreach -parallel keyword is only intended to run in parallel for the duration of the loop. It's not supposed to keep running while the rest of the script executes, it simply runs each iteration of the loop in parallel, and waits until all the iterations complete before continuing the script.

I was saying, "Yes and that's why the keyword is less good"
If your script is goes
Get items ; do something to each item in its own thread; format output.
Then The keyword approach can't start any threads until it has all the items and won't output anything until all threads have completed. But if they are a pipeline the threads will be started as the items are fetched, and the output can happen as the threads end. That overlapping of commands in a pipeline is makes a big perf difference if the commands either side of the parallelized one are slow

I feel like I am missing something here. We actually want a parse error, correct, so if someone tries to run this on down level Powershell that the whole script doesn't run.

Some scripts and modules need to run on versions from PS 7 down to PS 3 (or even PS 2 in the case of Pester), and not just on Windows.

An example is the PowerShellEditorServices (backend to the PowerShell extension for VSCode) startup script; it must run on everything from PS 3 in Windows Server 2012 (in some cases 2008) to PS 7 on Ubuntu 18.04. We took it out, but for example that script used to call chmod on *nix and Set-Acl on Windows. Imagine if you couldn't wrap that in an if, but it was a parse error.

Keywords that don't exist in any of those versions can't be used anywhere in that script. We'd have to write a whole new script (slowing down startup, increasing the download size, duplicating the code). Whereas a command parameter can be added to a splat conditionally. PS 7 users would get the parallel speedup, but it still works in PS 3.

Another example is the Install-VSCode.ps1 script. It wants to be fast, so in Windows PowerShell it uses Start-BitsTransfer since that's available. If that resulted in a parse error, you wouldn't be able to do that. We'd have to either publish two scripts, or settle for Invoke-WebRequest (which was sped up considerably in PS 6 btw :)).

You can already prevent downlevel running at parse time with #requires -Version 7.0. But as someone who maintains several complicated scripts that must work all the way downlevel, I'd like the ability to leverage PowerShell's dynamism to get the best everywhere.

That startup script is one fugly piece of scripting. :-)

But I think you miss @jhoneill's point - a down-level script engine (e.g., PS v3) should generate an error if not gated by a PSVersion check. "foreach -parallel" does not pass execution time checking on PS v3, even though a proper AST is generated - up until the "-parallel" parameter.

But if the code is protected by a PSVersion check, all is OK. I wouldn't think that should change - and I don't think that @jhoneill is suggesting otherwise.

I was missing that PS v3 was handling this differently. When I checked it on PS v5.1, I got this message when calling foreach -parallel:

the '-parallel' parameter can be used only within a workflow.

And that is exactly what I expected to happen on 5.1. Because it was a parse exception, it never even tried to execute. This is also what I expected to happen.

I expect that I would need to use it in a workflow on a pre PS v7 script. I am perfectly OK with it as a PS v7 feature that is not compatible with anything lower. If you are targeting a lower version, then you need to not use the newer features.

With that said, I ok with script authors abusing the syntax to write multi-versioned scripts but I don't think that should dictate the primary design. If foreach -parallel should be a thing, the fact that you can't use a PS v7 feature in a PS v3 script should not prevent us from implementing it in the ideal way (if we ever decide what that is).

Yes, I know some people need to write scripts that target PS v3 and it would be really nice to have a script that just ran faster on PS 7.0 and still worked on PS v3, but it is also perfectly OK for it to be a parse exception in PS v3. We already have things in PS v7 that are a parse exception in PS v5.1

I know some people need to write scripts that target PS v3 and it would be really nice to have a script that just ran faster on PS 7.0 and still worked on PS v3, but it is also perfectly OK for it to be a parse exception in PS v3. We already have things in PS v7 that are a parse exception in PS v5.1

Up to a point ... Having dealt with clients where it is just too painful to get servers updated from PS4 to PS5 and found my scripts used a couple of bits of 5 specific syntax, I'm probably more keen than average that things should work on old versions.

This errors without running anything on 5.1

$lastByte = 1..10 if ($PSVersionTable.PSVersion.Major -lt 7) { foreach ($b in $lastbyte) {Test-Connection "192.168.0.$b" -Count 1 } } else { foreach ($b in $lastbyte) -parallel {Test-Connection "192.168.0.$b" -Count 1 } }

This runs

$lastByte = 1..10 if ($PSVersionTable.PSVersion.Major -lt 7) { $lastbyte | foreach -Process {Test-Connection "192.168.0.$b" -Count 1 } } else { $lastbyte | foreach -parallel {Test-Connection "192.168.0.$_" -Count 1 } }

Now, if everything else were equal (and it's not the cmdlet can go in a pipeline) I think most people would say the implementation which supports one script for two versions is preferable - it's not mandatory

But here's why breaking can be good. Imagine a college creating a ton of new users at the start of a year.
$newUsers = import-csv new.csv | Add-CustomUser
$newUsers | export-csv created.csv
$newUsers | foreach-object {add-CustomHomeDir $_}

So we do this and it all looks good but someone says "It makes the new csv real quick but creating the home directories feels like a month" so they add -parallel. Then someone runs the script on another box and the users get created, the file is exported and BANG error with no homedirectories set up. We can't run the script again because the users exist so we have to clean up and it's all horrible.
Would someone who did the quick conversion think to put requires at the top the script in case someone runs it on an old version ? A complete fail would save them from themselves; but I would prefer that command line as
import-csv new.csv | Add-CustomUser -outvariable $newusers | foreach-object {add-CustomHomeDir $_}
Something which didn't let me create home directories until I'd created the last user would be bad

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md

Jaykul

In the end, I don't feel a strong case was made for making this part of the language instead of just shipping a module.

In addition to the syntax reasons mentioned by others, a cmdlet in a module has the added benefit of being shippable (or already shipped) to the gallery, and the option of being backward compatible ...

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md

essentialexch · 2019-05-14T18:08:03Z

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md

+
+This is a proposal to re-implement `foreach -parallel` in PowerShell Core, using PowerShell's support for concurrency via Runspaces.
+It is similar to the [ThreadJob module](https://www.powershellgallery.com/packages/ThreadJob/1.1.2) except that it becomes part of the PowerShell language via `foreach -parallel`.
+


Regardless of how implemented (as a cmdlet or a foreach extension), I seriously hope you will make this compatible with psexec. When I filed an issue against threadjob noting that it broke psexec, psexec was blamed. I doubt highly that sysinternals/Mark Russinovich would agree with that.

doctordns · 2019-05-14T19:38:11Z

@Jaykul Putting it into the language makes sense to me. It brings a great workflow feature into the full language. I think it is ok for Pwsh 7 to make changes like this, in the same way Msft did with Windows Power Shell 3.0. yes, you could ship cmdlets to do this, but foreach -parallel is more natural.

I could argue you have not made the case for not updating the language.

vexx32 · 2019-05-14T19:47:37Z

If we're to add this to the language, I think it ought to be a new keyword. The amount of switches we're talking about adding on top of -Parallel makes for some very quirky and clunky syntax.

Additionally, all of the additional switches? Come from the job cmdlets. In m opinion, they shouldn't be mirrored on language keywords. If we want those, it makes more sense to stick to cmdlets.

Keywords are meant to do ONE thing very well, with minimal or zero configurability.

Cmdlets are more appropriate for the PowerShell ecosystem in my opinion. Doubly so with the job-like additions you want applied, and the ability to return job objects.

PaulHigin · 2019-05-14T21:15:03Z

@vexx32 I have to agree about the -parallel, -asjob etc. switches in the language. They feel very cumbersome. I like the idea of updating the language to support this, but switches feel more natural with cmdlets.

fMichaleczek · 2019-05-14T21:30:19Z

what happen to discoverability in case of inclusion to the language ?
Get-Help, Autocompletion will be supported ?

jhoneill · 2019-05-14T22:07:53Z

@doctordns
You can pipe into a cmdlet but not into a statement. I wrote my own parallelizer so I could do
Get-ListOfSomething | start-parallel {work on something }| do-something-with-results ...

In practice it is similar to thread jobs, but automatically getting the result of each completed one. So it could be

Get-ListOfSomething | foreach-object {start-threadjob -argumentlist $_ {work on something }
get-job | recieve-job | do-something-with-results

But if the parallelism comes from a statement it isn't as good:

$L = Get-listOfSomething
$r = foreach -parallel ($item in L) {work on something} 
$r | do-something-with-results

My use case was: get all servers in AD. Check if they were still in DNS, and if they were, ping them, and if they answered then query settings on them. as @JamesWTruher asks above , do you extend foreach to also have a -threadCount parameter (do we want thousands of threads if L is a list of thousands ?) And a something running in its own run space (mine or thread job) doesn't have access to the variables from what ever launced it so taking this code

$hashtable = @{} 
foreach ($item in $L) {$hashtable[$item.id] = $item.name}

and adding -parallel would break the script block because the hashtable isn't shared.
I would argue that for this reason alone it should not be a adaptation of foreach (cmdlet or statement). It could be a new statement, but my contention is it is better to be able to include parallelism in a pipleline so I would say new cmdlet.

iSazonov · 2019-05-15T04:35:49Z

I like the idea of updating the language to support this

Are we talking about this?

    parallel {
        # A script block
    }

    $job = parallel {
        # A script block
    }

jhoneill · 2019-05-15T07:19:05Z

@iSazonov yes.
it would be
parallel [parameters] {scriptblock}

I don't think anyone is aguing that there should not be backed in support for parallelism. The questions are
(a) A solution which uses runspaces , as present methods do, would mean the script block doesn't have access to external variables, so everything must be passed as a parameter and returned as a result. Therefore the scriptblock behaves like the one in start-job or a remote invoke-command (and would need to support "Using" which only works with those, but not with foreach).
(b) It follows that a script block which works with a foreach / ForEach-Object now would still work if these had a -Parallel switch, or that a script written for V7 would work on an earlier version by removing -Parallel
(c) Statements can't directly form part of a pipeline; compare
$(foreach ($x in 1..3) {$x ; sleep 2}) | foreach {$_}
and
1..3 | foreach {$_ ; sleep 2} | foreach {$_ }

The first has to completely evaluate the statement before it pipes the result into
So if you had to do $(Parallel <<whatever>>) it can not begin until it has ALL its input, and the next step can not begin until it has generated ALL the output.

So the discussion is keyword or cmdlet ? and Switches for existing (timeout, number of threads) or New command ?

I'd say cmdlet, and distinct command.

doctordns · 2019-05-15T09:05:51Z

Purely on the point on switches. An alternative to having switches might be to have preference variables, eg $PSForEachP_ThrottleLimit and $PSForEachP_TimeOut ?

vexx32 · 2019-05-15T12:49:17Z

No, I don't think that's a good idea. That would very quickly get cumbersome in any script which needs to use the keyword frequently.

jhoneill · 2019-05-15T13:10:49Z

@doctordns ... so a global variable rather than passing as a parameter ?
And seriously, I set my preference for what's good on my machine, never set a limit in my scripts. I share a script with you and with your preference it sends a massive number of concurrent requests to your sql/ad/whatever. The number should be explicit.

SteveL-MSFT · 2019-05-15T22:35:05Z

@PowerShell/powershell-committee discussed this and our recommendation is to implement as ForEach-Object -Parallel instead of reusing the foreach -parallel statement.

iSazonov · 2019-05-16T03:52:36Z

Yesterday I accidentally saw GNU utility like "parallel --pipe". Also we could find "Where-Object -Parallel" useful too. And "Select-String". And others. So this suggests that maybe we need something more general for pipe parallelization (Parallel-Pipe)

$hugeArray | Parallel-Pipe { Where-Object Name -eq "test" }
$hugeArray | Parallel-Pipe {Select-String -Pattern "test" }
$hugeArray | Parallel-Pipe -AsJob { ... }

rjmholt · 2019-05-16T04:04:00Z

@iSazonov A rose by any other name would smell as sweet:

$hugeArray | ForEach-Object -Parallel { $_ | Where-Object Name -eq "test" }
$hugeArray | ForEach-Object -Parallel { $_ | Select-String -Pattern "test" }
$hugeArray | ForEach-Object -Parallel -AsJob { ... }

I think in the name of single-purpose, composible cmdlets having a single parallel cmdlet (rather than trying to parallelise each individually) is the right way to go. But I think ForEach-Object has that brief of Select-like enumeration. And just like LINQ has .AsParallel(), I think -Parallel makes sense to do this. But that's just a brief and not-strongly-held opinion :)

iSazonov · 2019-05-16T04:29:52Z

@rjmholt There is a problem in depth with "ForEach-Object -Parallel" - there is a lot of parameters and how will "-Begin/Process/End" and etc work - in "ForEach-Object -Parallel" context or scriptblock context? I think we will have to duplicate parameters. In the case Parallel-Pipe is more simple and more safely.

Parallel-Pipe     -InputOblect $data
                  -Begin { … } 
                  -Process { Foreach-Object -Begin { … } -Process { … } -End { … } }
                  -End { … }

nightroman · 2019-05-16T05:50:22Z

FWIW, Split-Pipeline from SplitPipeline uses two kinds of Begin and End.

One pair, invoked once for each parallel pipeline, is defined as parameters Begin and End.
Their goal is to setup and cleanup pipeline workspaces.

Split-Pipeline
    [-Script] <ScriptBlock>
    [[-InputObject] <PSObject>]
    [-Begin <ScriptBlock>]
    [-End <ScriptBlock>]
    ...

Another pair is defined in Script as the usual Begin and End blocks.
This pair works for each batch of objects sent to a pipeline. Note that each
parallel pipeline may have several input batches.

Unlike in ForEach-Object, the script normally has its Process block.
Split-Pipeline is designed to be the parallel version of the less popular

# sequential
... | .{process{ $_ }}

# parallel
... | Split-Pipeline {process{ $_ }}

construct than the more popular ForEach-Object

... | ForEach-Object { $_ }

Why? Because the script may be defined elsewhere and it can be used for both
sequential and parallel ways without changes (except some cases)

# common script
$script = {process{ $_ }}

# sequential
... | . $script

# parallel
... | Split-Pipeline $script

jhoneill · 2019-05-16T07:30:55Z

@PowerShell/powershell-committee discussed this and our recommendation is to implement as ForEach-Object -Parallel instead of reusing the foreach -parallel statement.

@SteveL-MSFT
if -parallel specifies a script block (like the -process which is often implied) it solves my concern where a script block which works sequentially would not work in parallel.

It still needs a -threads option - some services will allow more than 1 connection from the same client but won't allow 10000000 :-) And if -Parallel is creating run spaces it will need a $using:
You could have foreach-object -begin {xxx} -parallel {yyyy} -end {zzzz} and
get-stuff | foreach-object -parallel {aaaa} | foreach-object -parallel {bbbb}
Which is how I would want it to work.

@iSazonov - begin is "before all" -end is "after all" and where is just foreach-object {if (<<condition>>) {$_} }

rjmholt · 2019-05-16T19:34:48Z

@jhoneill

It still needs a -threads option - some services will allow more than 1 connection from the same client but won't allow 10000000 :-)

I think that's proposed here as -ThrottleLimit (which sets a maximum parallelisation level)

PaulHigin · 2019-05-16T20:40:47Z

Ok, I'll re-work this RFC to be a ForEach-Object cmdlet parameter set addition rather than a language keyword change. Thanks for all the responses.

iSazonov · 2019-05-17T03:09:29Z

@PaulHigin Maybe open new PR because we have already 53 conversations that slow down GitHub interface?

PaulHigin · 2019-05-17T15:27:26Z

I'll create a new RFC PR for cmdlet version of foreach parallel

PrzemyslawKlys · 2019-06-10T06:48:45Z

To be honest I really liked the idea of foreach -parallel rather then ForEach-Object version of it. I don't know much about implementation cost thou. It would be great having both options so one can use it with pipeline and without.

PaulHigin · 2019-06-18T22:58:05Z

@PrzemyslawKlys We can eventually do both. But it sounds like the community prefers the ForEach-Object cmdlet solution so that can be implemented first.

iSazonov · 2019-06-19T04:25:45Z

Continue in #194

PrzemyslawKlys · 2019-06-19T08:04:10Z

@PaulHigin, I understand. I use pipeline very rarely, mostly because there is a performance difference. If you can make it gone, it doesn't matter then. For example, I am removing Where-Object where speed matters because it's 10x slower than standard foreach loop. But maybe it's just Where-Object that has to be fixed?

iSazonov · 2019-06-19T08:45:21Z

because there is a performance difference

@PrzemyslawKlys If you see performance problems in important scenarios please report in new issue(s).

PrzemyslawKlys · 2019-06-19T09:09:28Z

@iSazonov which repo?

iSazonov · 2019-06-19T09:17:03Z

https://github.com/PowerShell/PowerShell

jhoneill · 2019-06-19T09:17:44Z

@PrzemyslawKlys Tbh designing to prevent use of the pipeline is not a clever thing to do.
There is an overhead in passing things down the pipeline. If the work done in a step is small then removing the pipeline is faster. BUT
get-DataFromServer | where {complex condition} | New-Thing
Will finish faster than

$d =get-DataFromServer
$f= foreach ($item in $d) {if (complex condition) {$item}}
foreach ($filteredItem in $f) {new-thing $filtereditem}

because New-thing can process the first item to come from the get-Dataserver before the last one has been returned.
It turns out that avoiding the pipeline is only good SOMETIMES.

What you are advocating is that something which allows you to start pieces of work without waiting for the previous one to finish , should be implemented so it can only begin when the whole of the previous task has finished and should not allow the next one to start until all its parallel parts have finished. That's why the cmdlet is better.

PrzemyslawKlys · 2019-06-19T09:47:38Z

@jhoneill I don't want to change how things are done. I just want them to be faster in places I use them. In most of my tests, Where-Object was slow, so I started using Foreach and dropping pipeline.

From most of my tests using Foreach-Object was slower than Foreach as well, however, I'm only talking about cases I've used them in.

Now, this is why I actually asked for implementing both ForEach-Object and ForEach with a parallel in place so when it makes sense to use the pipeline, use pipeline, but where it's not don't use it.

In the case of AD, for example, using pipeline is very risky. Sure you can process objects as they come but often this would cause Context Errors. But the very same problems would occur if you would try and introduce runspaces (for example getting Users, Computers, and Groups at the same time with 3 different queries).

I don't want to stop using the pipeline. I'm a pro pipeline. I like it, I like the idea, but it's just not suited for all scenarios and if I can get -parallel for both sign me up.

As for Where-Object have a look here: PowerShell/PowerShell#9941

doctordns · 2019-06-19T09:47:47Z

@PrzemyslawKlys We can eventually do both. But it sounds like the community prefers the ForEach-Object cmdlet solution so that can be implemented first.

I am not sure that that is correct. Certainly, when I teach PowerShell, I do not get that sense. I think the folks posting here are more aware of the implementation issues and are avoding the harder one (which is not a bad thing). But standing back, I think implementing the Foreach as a language feature is the better option, although a close second for me is to do both.

I have two reasons why I think that. First, I believe we should have symmetry with the Workflow syntax. Yes, Workflow may have been awful implementation, but I do like the language feature allowing parallelism. Secondly, it would mean I do not have to rely on $_/$PSITEM in the script blocks. Forcing that is not good practice, in my view.

I also think that NOW is the best time to implement both approaches. If the team punts on adding the language feature now, when would be appropriate? PWSH 7 offers a unique opportunity to get some things right (or at least a whole lot better).

I fully understand the complexity argument. But if you are going to implement parallelism like this into PowerShell, let's do it completely, let's do it right, and let's do it a PWSH 7.

jhoneill · 2019-06-19T11:39:06Z

it sounds like the community prefers the ForEach-Object cmdlet

I am not sure that that is correct. Certainly, when I teach PowerShell, I do not get that sense.

First, I believe we should have symmetry with the Workflow syntax. Yes, Workflow may have been awful implementation,

I think we should understand that the order in which they learned something. I remember in my early days with PowerShell first I new about % , then forEach as an alias, and finally the split between the foreach keyword and the foreach-object cmdlet.

Depending on how a course is structured people might learn for (which is only for recovering C programmers) while, and for each before they learn about pipelines. Or they might learn about composing things with a pipeline before they learn the language constructs. People with no exposure to workflow (and I looked at it and couldn't find a use) will have a different view to those with a lot of exposure.

jhoneill · 2019-06-19T13:00:34Z

@jhoneill I don't want to change how things are done. I just want them to be faster in places I use them. In most of my tests, Where-Object was slow, so I started using Foreach

Where-object is a bad case for the pipeline because if you do
| where -property -ne 0

The time to do the work is smaller than pipeline overhead is big. But when you test and say "it is 10 times faster" you are not talking about saving seconds, but much less than 1ms per operation. (See https://jamesone111.wordpress.com/2019/04/06/powershell-functions-and-when-not-to-use-them/ for the similar overhead per function call it's ~ 100 microSeconds ). It is not visible unless you are working with many, many thousands of items.

Try this
1..5 | foreach {sleep 2; ($_ + 1) } | foreach {sleep 1 ; ($_ + 1) }
It starts generating output at 3 seconds.
Compared with

$data =  1..5 
$intermediate = foreach ($item in $data) {sleep 2; ($item + 1) } 
$output = foreach ($item in $intermediate) {sleep 1 ; ($item + 1) }
$output

No output for the first 15 seconds. They take the same time end to end, but I can read the first output while the process is running.

The use case for which I wrote start-Parallel was (roughly):

It takes ~ 100 seconds to return a list of 100 servers to test (it turns out DoS protection prevents one of the checks for "do we test this" can't be parallelized) .
10% of the servers are off line and it takes 20 seconds for a server to time out , and 3 seconds to run the test. Run in sequence this takes 100 + 10 * 20 seconds + 90 * 3 seconds = 570 seconds
If no servers were off line, using start-parallel I only need 4 threads because server1's test results come in and its thread can be reused at about the time the information arrives saying test server4. I might have 2 more threads for servers which are not responding. But with 6-7 threads (and only 1 or 2 causing network traffic at any time) I get my first result after 3-4 seconds, and the last one after ~120 seconds
But if I do not stagger the start of the threads, I get nothing for 100 seconds. Then I start 100 threads, and still get the last result after 120 seconds but I've concentrated all the load in two 20 seconds, and maybe it would better to cap to, say, 10 threads. (In practice to prevent accidental DoS attacks we'll need a low default number)
So I can query first 10 servers, one blocks and 3 seconds later I can query 9 more, one blocks 3 seconds later I query 8 more, at equilibrium about 5 threads are blocked so I'm doing about 1.5 servers per second. So it takes ~ 60 seconds before I have the final test in progress and it is 20 seconds after that when the last one comes in. So 180 seconds in all.
Either way , it is preferable allow the threads to start before we have all the input data.

PrzemyslawKlys · 2019-06-19T13:28:38Z

I understand the use case and output may vary depending on how you test it. And that's a problem. I shouldn't need to test it. I shouldn't have to worry about stuff like this.

In my case - If you have a collection of 10 items Where-Object will be good. In case of my 4412 items where for each object I did another loop thru 4412 objects using Where-Object, it was 7 minutes vs 59 seconds. If this can be optimized in PS 7 so I can stop worrying about testing things like this and focus on solving problems. I don't care about micro optimization like 15 seconds to 5 seconds. But 7 minutes to 1minute it's a big deal.

I understand that 19465744 loops (4412 * 4412) is actually a lot, and it may take time, the difference is just too big.

I use a lot of runspaces in my scripts, but till some point I was taking Where-Object and few other stuff as granted, just to find out they are very slow in comparison to foreach.

jhoneill · 2019-06-19T15:59:45Z

If you want to see what is wrong with where try
(1..1mb ).where({$_ % 131071 -eq 0}) ## No pipeline takes 6.5 seconds on my machine

foreach ($i in 1..1mb) {if ($i % 131071 -eq 0) {$i}} ## 1.6 seconds.

What is more of an issue (for me) is that for a lot where-object and foreach-object don't expand an array if it is passed as inputobject parameter so when you have multiple objects you are forced to pipe the input in. Other things do expand

Add RFC for foreach-parallel feature

603b498

ghost approved these changes May 14, 2019

View reviewed changes

rjmholt reviewed May 14, 2019

View reviewed changes

johlju reviewed May 14, 2019

View reviewed changes

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md Show resolved Hide resolved

Jaykul reviewed May 14, 2019

View reviewed changes

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md Show resolved Hide resolved

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md Show resolved Hide resolved

1-Draft/RFCnnnn-ForEach-Parallel-Implementation.md Show resolved Hide resolved

essentialexch reviewed May 14, 2019

View reviewed changes

PaulHigin closed this May 17, 2019

PaulHigin deleted the foreach-parallel branch May 17, 2019 15:29

iSazonov mentioned this pull request Jun 19, 2019

Submit draft of RFC for ForEach-Object -Parallel proposal #194

Merged

rjmholt mentioned this pull request Aug 21, 2019

Implement ForEach-Object -Parallel feature PowerShell/PowerShell#10229

Merged

14 tasks


		This is a proposal to re-implement `foreach -parallel` in PowerShell Core, using PowerShell's support for concurrency via Runspaces.
		It is similar to the [ThreadJob module](https://www.powershellgallery.com/packages/ThreadJob/1.1.2) except that it becomes part of the PowerShell language via `foreach -parallel`.

Add RFC for foreach-parallel feature #174

Add RFC for foreach-parallel feature #174

Conversation

PaulHigin commented May 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjmholt Jun 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jaykul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

doctordns commented May 14, 2019

vexx32 commented May 14, 2019 • edited Loading

PaulHigin commented May 14, 2019

fMichaleczek commented May 14, 2019

jhoneill commented May 14, 2019

iSazonov commented May 15, 2019 • edited Loading

jhoneill commented May 15, 2019

doctordns commented May 15, 2019

vexx32 commented May 15, 2019

jhoneill commented May 15, 2019

SteveL-MSFT commented May 15, 2019

iSazonov commented May 16, 2019

rjmholt commented May 16, 2019

iSazonov commented May 16, 2019

nightroman commented May 16, 2019 • edited Loading

jhoneill commented May 16, 2019 • edited Loading

rjmholt commented May 16, 2019

PaulHigin commented May 16, 2019

iSazonov commented May 17, 2019

PaulHigin commented May 17, 2019

PrzemyslawKlys commented Jun 10, 2019

PaulHigin commented Jun 18, 2019

iSazonov commented Jun 19, 2019

PrzemyslawKlys commented Jun 19, 2019

iSazonov commented Jun 19, 2019

PrzemyslawKlys commented Jun 19, 2019

iSazonov commented Jun 19, 2019

jhoneill commented Jun 19, 2019

PrzemyslawKlys commented Jun 19, 2019

doctordns commented Jun 19, 2019

jhoneill commented Jun 19, 2019

jhoneill commented Jun 19, 2019

PrzemyslawKlys commented Jun 19, 2019 • edited Loading

jhoneill commented Jun 19, 2019

rjmholt Jun 11, 2019 •

edited

Loading

vexx32 commented May 14, 2019 •

edited

Loading

iSazonov commented May 15, 2019 •

edited

Loading

nightroman commented May 16, 2019 •

edited

Loading

jhoneill commented May 16, 2019 •

edited

Loading

PrzemyslawKlys commented Jun 19, 2019 •

edited

Loading