(3.9.0 - 3.14.0) Performance degradation on tightly coupled workloads at scale

## The issue

Starting ParallelCluster 3.9.0, some performance degradation can occur on tightly coupled MPI workloads on large clusters.
The root cause is that in order to execute in-place cluster updates on compute and login nodes, which allowed for the mounting/unmounting of shared storage without replacing the nodes, we introduced a process supporting in-place updates on the compute nodes. Even if the process is lightweight, it is run periodically and may affect the performance of some specific workloads.

## Affected ParallelCluster versions, OSes and schedulers

All ParallelCluster versions from 3.9.0 to 3.14.0 on all OSes.

## Mitigation

You can find a detailed explanation and the mitigation of the problem [here](https://github.com/aws/aws-parallelcluster/wiki/(3.9.0-%E2%80%90-3.14.0)-Performance-degradation-on-tightly-coupled-workloads-at-scale).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(3.9.0 - 3.14.0) Performance degradation on tightly coupled workloads at scale #7095

The issue

Affected ParallelCluster versions, OSes and schedulers

Mitigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(3.9.0 - 3.14.0) Performance degradation on tightly coupled workloads at scale #7095

Description

The issue

Affected ParallelCluster versions, OSes and schedulers

Mitigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions