Description
Problem Statement
We're experiencing increasing InsufficientCapacity errors when launching compute nodes. As a workaround, we've implemented multiple compute node subnets across availability zones to allow ParallelCluster to retry provisioning in another AZ when capacity is unavailable.
However, we're unable to combine this multi-AZ compute approach with EFS OneZone storage, which offers better performance for many use cases (even when mounted across zones). Tools like Posit Workbench recommend OneZone for performance benefits. (https://docs.posit.co/posit-team/index.html)
Current Limitation
ParallelCluster validation prevents configuring OneZone EFS with compute nodes spanning multiple AZs, showing the error:
"EFS OneZone is only supported if all compute nodes and the head node are in the same Availability Zone. EFS OneZone can have only one mount target."
Requested Enhancement
Please modify ParallelCluster to:
- Allow the configuration of EFS OneZone storage with compute nodes distributed across multiple AZs
- Ideally, prioritize launching compute nodes in the same AZ as the EFS OneZone storage
- Fall back to other AZs only when capacity isn't available in the primary AZ
This change would maintain the performance benefits of OneZone EFS while adding resilience against capacity constraints.