-
Notifications
You must be signed in to change notification settings - Fork 11
Add ability to set thread affinity #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good I could find just few nits in threads.c you might want to address.
thanks.
source/perflib/threads.c
Outdated
unsigned int ret = 0; | ||
|
||
for (size_t i = 0; i < sizeof(a) * CHAR_BIT; i++) | ||
ret += ((a & (1ULL << i)) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this what I've messed up in my sgguested change. we discussed this off-llist we are supposed to count bits which are set, right? if so then we need ret += ((a & (1ULL << i)) != 0);
here.
goto err; | ||
} | ||
|
||
ta = OPENSSL_malloc(sizeof(*ta) * threadcount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to use OPENSSL_malloc() so tests work with libraries which don't provide OPENSSL_malloc_array()
args[i].num = i; | ||
perflib_run_thread(&threads[i], &args[i]); | ||
if (!(run_threads[i] = perflib_run_thread_(&threads[i], &args[i], | ||
ta + i))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use &ta[i]
here so it is clear we work with array, thanks.
Signed-off-by: Eugene Syromiatnikov <[email protected]>
efab928
to
c033606
Compare
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Co-Authored-by: Alexandr Nedvedicky <[email protected]> Signed-off-by: Eugene Syromiatnikov <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Signed-off-by: Eugene Syromiatnikov <[email protected]>
c033606
to
30f2a1f
Compare
static ossl_inline unsigned int popcount(affinity_t a) | ||
{ | ||
return __builtin_popcountl(a); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to special case the ability to use a compiler built in here? It seems like the balance between the ifdeffery here and a single function that counts up to sizeof(unsigned long) * 8 bits is biased in favor of just having one function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't like the idea of rolling own implementation when the built-in is right here, but I don't really care here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to use compiler built-ins can we also enable them for clang?
diff --git a/source/perflib/threads.c b/source/perflib/threads.c
index 8cf3a76..4f9187c 100644
--- a/source/perflib/threads.c
+++ b/source/perflib/threads.c
@@ -22,7 +22,7 @@
/** affinity_t-typed value with nth bit set. */
#define AFFINITY_BIT(n) ((affinity_t)1U << (n))
-#if defined(__GNUC__)
+#if defined(__GNUC__) || defined(__clang__)
static ossl_inline unsigned int popcount(affinity_t a)
{
@@ -41,7 +41,7 @@ static ossl_inline unsigned int popcount(affinity_t a)
return ret;
}
-#endif /* __GNUC__ */
+#endif /* __GNUC__ or __clang__ */
int perflib_roundrobin_affinity(affinity_t *cpu_set_bits, size_t cpu_set_size,
size_t num, size_t cnt, void *arg)
to be honest I'm with Neal here. My reasoning is the peftools need to be portable to as many platforms/compilers as (conveniently) possible. you are rolling the builtin implementation anway so using a bultinn one here does not buy as much.
on the other hand if limit ourselves to clang and GCC tools, then I'm fine with going to bultin only one.
the true reason I don't like the if/else here is it leaves a dead/untested code behind. In my opinion the true choice here should be:
- being portable, then roll your own
or - let's rely on compiler then code will work on platforms where bultiin is provided
in my view the perftools are roll your own case.
"\t-v verbose output, includes min, max, stddev, and median times\n" | ||
"\t-T timeout for each test run in seconds, can be fractional" | ||
"\t-T timeout for each test run in seconds, can be fractional\n" | ||
"\t-b Set CPU affinity for the threads (in round robin fashion)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about adding this option to all the other tests in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was prototyping on pkeyread
, but, yeah, adding it to other tests should be trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:
./pkeyread -f all -k all -b 16
and
taskset 0xffff ./pkeyread -f all -k all 16
If I understand things right, then th -b
is a shortcut so people don't need to think of using a taskset(1)
is my understanding correct?
OSSL_TIME max_time; | ||
|
||
int err = 0; | ||
int error = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? Theres a good portion of this PR dedicated to renaming variables that doesn't really have anything to do with the addition of thread affinity management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this address linker issues. there is function err()
which conflicts with variables err
. the changes in this PR just discovered this conflict. so the change got included here.
|
||
#include <string.h> | ||
#include <openssl/crypto.h> | ||
#include <openssl/macros.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? There is no other change in this file
} | ||
|
||
void | ||
err(int status, const char *fmt, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why to duplicate errx
function? Same for warn
and warnx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
/warn
append the output of perror()
to the message, while errx
/warnx
just print the provided string (along with the program name as a prefix).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
/warn
append the output of perror()
to the message, while errx
/warnx
just print the provided string (along with the program name as a prefix).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now; sorry for the noise
# include <err.h> | ||
|
||
# else /* _WIN32 */ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use new lines around include in #if.
The work is ok, but I'm a little bit lost why the work is needed. |
my understanding is you want to pin a thread to CPU so scheduler does not migrate the thread which runs performance test around the system. I think this does not present on system with low number of cores. it becomes more important on large multicore systems. |
static ossl_inline unsigned int popcount(affinity_t a) | ||
{ | ||
return __builtin_popcountl(a); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to use compiler built-ins can we also enable them for clang?
diff --git a/source/perflib/threads.c b/source/perflib/threads.c
index 8cf3a76..4f9187c 100644
--- a/source/perflib/threads.c
+++ b/source/perflib/threads.c
@@ -22,7 +22,7 @@
/** affinity_t-typed value with nth bit set. */
#define AFFINITY_BIT(n) ((affinity_t)1U << (n))
-#if defined(__GNUC__)
+#if defined(__GNUC__) || defined(__clang__)
static ossl_inline unsigned int popcount(affinity_t a)
{
@@ -41,7 +41,7 @@ static ossl_inline unsigned int popcount(affinity_t a)
return ret;
}
-#endif /* __GNUC__ */
+#endif /* __GNUC__ or __clang__ */
int perflib_roundrobin_affinity(affinity_t *cpu_set_bits, size_t cpu_set_size,
size_t num, size_t cnt, void *arg)
to be honest I'm with Neal here. My reasoning is the peftools need to be portable to as many platforms/compilers as (conveniently) possible. you are rolling the builtin implementation anway so using a bultinn one here does not buy as much.
on the other hand if limit ourselves to clang and GCC tools, then I'm fine with going to bultin only one.
the true reason I don't like the if/else here is it leaves a dead/untested code behind. In my opinion the true choice here should be:
- being portable, then roll your own
or - let's rely on compiler then code will work on platforms where bultiin is provided
in my view the perftools are roll your own case.
"\t-v verbose output, includes min, max, stddev, and median times\n" | ||
"\t-T timeout for each test run in seconds, can be fractional" | ||
"\t-T timeout for each test run in seconds, can be fractional\n" | ||
"\t-b Set CPU affinity for the threads (in round robin fashion)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand Nikola's question better now. and I think he is making a good point. let me ask the question different way: what is a difference between running the test using the command:
./pkeyread -f all -k all -b 16
and
taskset 0xffff ./pkeyread -f all -k all 16
If I understand things right, then th -b
is a shortcut so people don't need to think of using a taskset(1)
is my understanding correct?
I think the difference between:
and
Is that in the latter case we rely on the OS scheduler to place threads on unique cores. In the former case thread 1 is guaranteed to have an affinity of 0x1, thread 2 an affinity of 0x2, thread 3 an affinity of 0x4, etc. In the latter all threads can run on any ore in the affinity set. Will they likely be scheduled to unique cores? Probably. Are they guaranteed to be? No. I guess the question to ask is "Does that matter to us?", and honestly, I'm not sure of the answer there. |
So, the original reason I ended up writing that is that while working on
All those factors are predominantly relevant only when running on NUMA systems, naturally. |
> All those factors are predominantly relevant only when running on NUMA systems, naturally.
understood. my preference here is to get away with In my opinion the less we do here the better. |
Includes the relevant update to the
pkeyread
test, as it already tries to report some thread indices in the-v
mode.