Add ability to set thread affinity #51

esyr · 2025-09-25T23:53:20Z

Includes the relevant update to the pkeyread test, as it already tries to report some thread indices in the -v mode.

Sashan

looks good I could find just few nits in threads.c you might want to address.
thanks.

Sashan · 2025-09-30T13:38:34Z

source/perflib/threads.c

+    unsigned int ret = 0;
+
+    for (size_t i = 0; i < sizeof(a) * CHAR_BIT; i++)
+        ret += ((a & (1ULL << i)) == 0);


I think this what I've messed up in my sgguested change. we discussed this off-llist we are supposed to count bits which are set, right? if so then we need ret += ((a & (1ULL << i)) != 0); here.

Sashan · 2025-09-30T13:40:06Z

source/perflib/threads.c

+        goto err;
    }

+    ta = OPENSSL_malloc(sizeof(*ta) * threadcount);


I think it is good to use OPENSSL_malloc() so tests work with libraries which don't provide OPENSSL_malloc_array()

Sashan · 2025-09-30T13:40:37Z

source/perflib/threads.c

        args[i].num = i;
-        perflib_run_thread(&threads[i], &args[i]);
+        if (!(run_threads[i] = perflib_run_thread_(&threads[i], &args[i],
+                                                   ta + i)))


can we use &ta[i] here so it is clear we work with array, thanks.

Signed-off-by: Eugene Syromiatnikov <[email protected]>

Co-Authored-by: Alexandr Nedvedicky <[email protected]> Signed-off-by: Eugene Syromiatnikov <[email protected]>

Signed-off-by: Eugene Syromiatnikov <[email protected]>

nhorman · 2025-10-16T15:31:06Z

source/perflib/threads.c

+static ossl_inline unsigned int popcount(affinity_t a)
+{
+    return __builtin_popcountl(a);
+}


do we really need to special case the ability to use a compiler built in here? It seems like the balance between the ifdeffery here and a single function that counts up to sizeof(unsigned long) * 8 bits is biased in favor of just having one function.

I just don't like the idea of rolling own implementation when the built-in is right here, but I don't really care here.

if we want to use compiler built-ins can we also enable them for clang?

diff --git a/source/perflib/threads.c b/source/perflib/threads.c index 8cf3a76..4f9187c 100644 --- a/source/perflib/threads.c +++ b/source/perflib/threads.c @@ -22,7 +22,7 @@ /** affinity_t-typed value with nth bit set. */ #define AFFINITY_BIT(n) ((affinity_t)1U << (n)) -#if defined(__GNUC__) +#if defined(__GNUC__) || defined(__clang__) static ossl_inline unsigned int popcount(affinity_t a) { @@ -41,7 +41,7 @@ static ossl_inline unsigned int popcount(affinity_t a) return ret; } -#endif /* __GNUC__ */ +#endif /* __GNUC__ or __clang__ */ int perflib_roundrobin_affinity(affinity_t *cpu_set_bits, size_t cpu_set_size, size_t num, size_t cnt, void *arg)

to be honest I'm with Neal here. My reasoning is the peftools need to be portable to as many platforms/compilers as (conveniently) possible. you are rolling the builtin implementation anway so using a bultinn one here does not buy as much.

on the other hand if limit ourselves to clang and GCC tools, then I'm fine with going to bultin only one.

the true reason I don't like the if/else here is it leaves a dead/untested code behind. In my opinion the true choice here should be:

being portable, then roll your own
or

let's rely on compiler then code will work on platforms where bultiin is provided

in my view the perftools are roll your own case.

nhorman · 2025-10-16T16:04:42Z

source/pkeyread.c

        "\t-v  verbose output, includes min, max, stddev, and median times\n"
-        "\t-T  timeout for each test run in seconds, can be fractional"
+        "\t-T  timeout for each test run in seconds, can be fractional\n"
+        "\t-b  Set CPU affinity for the threads (in round robin fashion)\n"


what about adding this option to all the other tests in the repo?

I was prototyping on pkeyread, but, yeah, adding it to other tests should be trivial.

I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:

./pkeyread -f all -k all -b 16

and

taskset 0xffff ./pkeyread -f all -k all 16

If I understand things right, then th -b is a shortcut so people don't need to think of using a taskset(1) is my understanding correct?

nhorman · 2025-10-16T16:07:54Z

source/pkeyread.c

 OSSL_TIME max_time;

-int err = 0;
+int error = 0;


Why this change? Theres a good portion of this PR dedicated to renaming variables that doesn't really have anything to do with the addition of thread affinity management.

this address linker issues. there is function err() which conflicts with variables err. the changes in this PR just discovered this conflict. so the change got included here.

jogme · 2025-10-16T15:41:30Z

source/perflib/perfhelper.c


 #include <string.h>
 #include <openssl/crypto.h>
+#include <openssl/macros.h>


why is this needed? There is no other change in this file

jogme · 2025-10-16T16:57:50Z

source/perflib/err.c

 }

+void
+err(int status, const char *fmt, ...)


why to duplicate errx function? Same for warn and warnx

err/warn append the output of perror() to the message, while errx/warnx just print the provided string (along with the program name as a prefix).

I see now; sorry for the noise

npajkovsky · 2025-10-16T20:44:07Z

source/perflib/err.h

+#  include <err.h>
+
+# else /* _WIN32 */
+


We don't use new lines around include in #if.

npajkovsky · 2025-10-16T21:02:57Z

The work is ok, but I'm a little bit lost why the work is needed.

Sashan · 2025-10-17T09:52:28Z

The work is ok, but I'm a little bit lost why the work is needed.

my understanding is you want to pin a thread to CPU so scheduler does not migrate the thread which runs performance test around the system. I think this does not present on system with low number of cores. it becomes more important on large multicore systems.

Sashan · 2025-10-17T12:11:06Z

source/perflib/threads.c

+static ossl_inline unsigned int popcount(affinity_t a)
+{
+    return __builtin_popcountl(a);
+}


if we want to use compiler built-ins can we also enable them for clang?

diff --git a/source/perflib/threads.c b/source/perflib/threads.c index 8cf3a76..4f9187c 100644 --- a/source/perflib/threads.c +++ b/source/perflib/threads.c @@ -22,7 +22,7 @@ /** affinity_t-typed value with nth bit set. */ #define AFFINITY_BIT(n) ((affinity_t)1U << (n)) -#if defined(__GNUC__) +#if defined(__GNUC__) || defined(__clang__) static ossl_inline unsigned int popcount(affinity_t a) { @@ -41,7 +41,7 @@ static ossl_inline unsigned int popcount(affinity_t a) return ret; } -#endif /* __GNUC__ */ +#endif /* __GNUC__ or __clang__ */ int perflib_roundrobin_affinity(affinity_t *cpu_set_bits, size_t cpu_set_size, size_t num, size_t cnt, void *arg)

to be honest I'm with Neal here. My reasoning is the peftools need to be portable to as many platforms/compilers as (conveniently) possible. you are rolling the builtin implementation anway so using a bultinn one here does not buy as much.

on the other hand if limit ourselves to clang and GCC tools, then I'm fine with going to bultin only one.

the true reason I don't like the if/else here is it leaves a dead/untested code behind. In my opinion the true choice here should be:

being portable, then roll your own
or

let's rely on compiler then code will work on platforms where bultiin is provided

in my view the perftools are roll your own case.

Sashan · 2025-10-17T12:44:19Z

source/pkeyread.c

        "\t-v  verbose output, includes min, max, stddev, and median times\n"
-        "\t-T  timeout for each test run in seconds, can be fractional"
+        "\t-T  timeout for each test run in seconds, can be fractional\n"
+        "\t-b  Set CPU affinity for the threads (in round robin fashion)\n"


I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:

./pkeyread -f all -k all -b 16

and

taskset 0xffff ./pkeyread -f all -k all 16

If I understand things right, then th -b is a shortcut so people don't need to think of using a taskset(1) is my understanding correct?

nhorman · 2025-10-17T13:03:07Z

I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:

I think the difference between:

./pkeyread -f all -k all -b 16

and

taskset  0xffff ./pkeyread -f all -k all 16

Is that in the latter case we rely on the OS scheduler to place threads on unique cores.

In the former case thread 1 is guaranteed to have an affinity of 0x1, thread 2 an affinity of 0x2, thread 3 an affinity of 0x4, etc.

In the latter all threads can run on any ore in the affinity set. Will they likely be scheduled to unique cores? Probably. Are they guaranteed to be? No.

I guess the question to ask is "Does that matter to us?", and honestly, I'm not sure of the answer there.

esyr · 2025-10-17T13:22:16Z

The work is ok, but I'm a little bit lost why the work is needed.

So, the original reason I ended up writing that is that while working on x509storeissuer updates, I started seeing some anomalous results, and wanted to exclude that aspect from the list of possible factors. In general, pinning threads helps with the following:

it minimises noise from rescheduling and discrepancies of impacts of performance of specific CPU cores across test runs;
it allows referencing to thread numbers (which is sometimes useful in cases of anomalous performance of some of them), as they correlate with CPU cores that way;
it allows providing specific thread mappings on the system's topology, which is useful in conjunction with some other aspects of test runs, like, the way some resources are shared across threads or the way some thread perform work, and/or the CPU mask set for the whole test.

All those factors are predominantly relevant only when running on NUMA systems, naturally.

Sashan · 2025-10-17T15:52:41Z

> All those factors are predominantly relevant only when running on NUMA systems, naturally.

understood. my preference here is to get away with taskset(1) (if possible) also it looks like windows offer similar mechanism according to stack overflow The takset seems to be available on FreeBSD. Solaris has prset(1M) to set affinity for process I believe other systems which can manage thread affinity expose their own command line tooling.

In my opinion the less we do here the better.

esyr requested review from Sashan and jogme September 25, 2025 23:53

Sashan requested changes Sep 30, 2025

View reviewed changes

esyr linked an issue Sep 30, 2025 that may be closed by this pull request

[perftools] Add support for setting thread affinity in tests openssl/project#1660

Open

Sashan and others added 2 commits October 16, 2025 11:36

s/err/error where apropriate easiest way to fix liner issues on windows

d568228

Use perflib/err.h unconditionally

caed5ee

Signed-off-by: Eugene Syromiatnikov <[email protected]>

esyr force-pushed the esyr/thread-affinity branch 2 times, most recently from efab928 to c033606 Compare October 16, 2025 12:18

esyr and others added 8 commits October 16, 2025 15:15

perflib/err.c: use program_invocation_name on glibc

393f457

Signed-off-by: Eugene Syromiatnikov <[email protected]>

perflib: add vwarn/err/warn

953a33f

Signed-off-by: Eugene Syromiatnikov <[email protected]>

perflib/err.h: add WARN/WARNX/ERR/ERRX

e78cd6a

Signed-off-by: Eugene Syromiatnikov <[email protected]>

perflib: add ability to set thread affinity

1b1e38c

Co-Authored-by: Alexandr Nedvedicky <[email protected]> Signed-off-by: Eugene Syromiatnikov <[email protected]>

pkeyread: output counts array allocation error to stderr

2484c08

Signed-off-by: Eugene Syromiatnikov <[email protected]>

pkeyread: tfix

0d9c3c8

Signed-off-by: Eugene Syromiatnikov <[email protected]>

README.md: update pkeyread documentation

a1b53d4

Signed-off-by: Eugene Syromiatnikov <[email protected]>

pkeyread: add an option to bind threads to cores

30f2a1f

Signed-off-by: Eugene Syromiatnikov <[email protected]>

esyr force-pushed the esyr/thread-affinity branch from c033606 to 30f2a1f Compare October 16, 2025 13:16

esyr requested a review from Sashan October 16, 2025 13:30

esyr marked this pull request as ready for review October 16, 2025 13:31

vavroch2010 requested review from nhorman and npajkovsky October 16, 2025 15:03

nhorman requested changes Oct 16, 2025

View reviewed changes

jogme suggested changes Oct 16, 2025

View reviewed changes

npajkovsky reviewed Oct 16, 2025

View reviewed changes

source/perflib/err.h

# include <err.h>

# else /* _WIN32 */

Copy link

npajkovsky Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use new lines around include in #if.

Sashan requested changes Oct 17, 2025

View reviewed changes

Add ability to set thread affinity #51

Are you sure you want to change the base?

Add ability to set thread affinity #51

Conversation

esyr commented Sep 25, 2025

Uh oh!

Sashan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npajkovsky commented Oct 16, 2025

Uh oh!

Sashan commented Oct 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nhorman commented Oct 17, 2025

Uh oh!

esyr commented Oct 17, 2025

Uh oh!

Sashan commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants