Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -285,10 +285,10 @@ static double get_cpuload_internal(int which, double *pkernelLoad, CpuLoadTarget
} else if (get_totalticks(which, pticks) < 0) {
failed = 1;
}
// After restoring with CRaC the new process can appear 'younger'
// than last value in counters - we will return -1 (unavailable).
if (!failed && pticks->usedKernel >= tmp.usedKernel) {
Copy link
Collaborator

@TimPushkin TimPushkin Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read the precise definitions of used, usedKernel, total, is checking only usedKernel enough to guarantee the comparison is also valid for the other two values? Shouldn't we check both used and usedKernel (assuming total is exactly their sum, which may not be the case) in case only one of them has gone backwards?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole check is quite indeterministic: if we consider this is called at arbitrary point, the value of usedKernel can be just right and the results would not be correct anyway. I think that it's important that at the end the result is sanitized to a safe range 0 - 1.
If we consider a more realistic use case, e.g. calling this every 5 seconds, I would expect that before checkpoint the application spent significantly more time before checkpoint than it did until the first hit after restore, so this will detect potentially invalid result.
My main motivation was to avoid the crash in debug builds. We could perfect this by adding a hook that will reset the values during C/R but I would opt for a minimalistic change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like we should either check all three used* variables (because they are not fully dependent and the checks are cheap) or not check any of them (because checking does not guarantee correctness anyway)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it won't hurt.


if (!failed) {

assert(pticks->usedKernel >= tmp.usedKernel);
kdiff = pticks->usedKernel - tmp.usedKernel;
tdiff = pticks->total - tmp.total;
udiff = pticks->used - tmp.used;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/*
* Copyright (c) 2025, Azul Systems, Inc. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Azul Systems, 385 Moffett Park Drive, Suite 115, Sunnyvale
* CA 94089 USA or visit www.azul.com if you need additional information or
* have any questions.
*/

import com.sun.management.OperatingSystemMXBean;
import jdk.crac.Core;
import jdk.test.lib.crac.*;

import java.lang.management.ManagementFactory;

import static jdk.test.lib.Asserts.*;

/*
* @test
* @requires (os.family == "linux")
* @library /test/lib
* @build OperatingSystemMxBeanTest
* @run driver jdk.test.lib.crac.CracTest
*/
public class OperatingSystemMxBeanTest implements CracTest {
@Override
public void test() throws Exception {
// The restore must be in a new process => cannot use simengine
new CracBuilder().engine(CracEngine.CRIU).doCheckpointAndRestore();
}

@Override
public void exec() throws Exception {
OperatingSystemMXBean bean = ManagementFactory.getPlatformMXBean(OperatingSystemMXBean.class);
System.out.println("System CPU load: " + bean.getCpuLoad());
System.out.println("Process CPU load: " + bean.getProcessCpuLoad());
Core.checkpointRestore();
// We're restoring on the same system, so total CPU load should not have failed
assertLTE(0.0, bean.getCpuLoad());
// Per process ticks should be lower after restore than before checkpoint => load unavailable
assertEquals(-1.0, bean.getProcessCpuLoad());
// Second invocation should work with the updated value => correct
assertLTE(0.0, bean.getProcessCpuLoad());
}
}