Skip to content

(3.9.0 - 3.14.0) Performance degradation on tightly coupled workloads at scale #7095

@hgreebe

Description

@hgreebe

The issue

Starting ParallelCluster 3.9.0, some performance degradation can occur on tightly coupled MPI workloads on large clusters.
The root cause is that in order to execute in-place cluster updates on compute and login nodes, which allowed for the mounting/unmounting of shared storage without replacing the nodes, we introduced a process supporting in-place updates on the compute nodes. Even if the process is lightweight, it is run periodically and may affect the performance of some specific workloads.

Affected ParallelCluster versions, OSes and schedulers

All ParallelCluster versions from 3.9.0 to 3.14.0 on all OSes.

Mitigation

You can find a detailed explanation and the mitigation of the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions