-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "delay" and "delayType" options to secnetperf tool #4807
base: main
Are you sure you want to change the base?
Conversation
- Default/fixed delay introduces specific microseconds of inline busy-wait delay when the stream is established - Variable delay introduces an exponetially distributed random busy-wait delay. This is capped at 1millisecond inline sleep. - Documentation/help text updated
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4807 +/- ##
==========================================
- Coverage 87.02% 86.88% -0.15%
==========================================
Files 56 56
Lines 17422 17422
==========================================
- Hits 15162 15137 -25
- Misses 2260 2285 +25 ☔ View full report in Codecov by Sentry. |
@@ -71,6 +73,24 @@ PerfServer::Init( | |||
} | |||
} | |||
|
|||
_Benign_race_begin_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no real race possible, right? This is only the analysis being confused because it can't know initialization is always complete before the stream starts?
_No_competing_thread_
might be more adapted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although initialization is run only once, I am not sure if we are enforcing it. There are competing threads reading the variable and potential competing thread that can write to it. It should not matter in either case, hence the benign race tagging. I am not sure if this will help or hurt the readability here but just keeping it around while I work on developing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove all these. We do guarantee all initialization happens before all the callbacks below.
@@ -29,6 +29,8 @@ ecn | `-ecn:<0,1>` | Enables sender-side ECN support. | |||
exec | `-exec:<lowlat,maxtput,scavenger,realtime>` | The execution profile used for the application. | |||
pollidle | `-pollidle:<time_us>` | The time, in microseconds, to poll while idle before sleeping (falling back to interrupt-driven IO). | |||
stats | `-stats:<0,1>` | Prints out statistics at the end of each connection. | |||
delay | `[-delay:<microseconds>]` | Optional delay in microseconds to be introduced before the server responds to a request. | |||
delayType | `[-delayType:<fixed|variable>]` | Optional "delay type" can be specified in conjunction with the delay argument. "fixed" delay type introduces the specified delay before the server responds to any request and this is the default behavior. "variable" delay type introduces a statistical variability to the specified delay. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sample run with cmd .\secnetperf.exe -target:127.0.0.1 -down:1KB -ptput:1 -streams:10
Without server delays:
Result: Download 1000 bytes @ 2569 kbps (3.113 ms).
Result: Download 1000 bytes @ 2457 kbps (3.255 ms).
Result: Download 1000 bytes @ 2399 kbps (3.334 ms).
Result: Download 1000 bytes @ 2316 kbps (3.453 ms).
Result: Download 1000 bytes @ 2251 kbps (3.553 ms).
Result: Download 1000 bytes @ 2200 kbps (3.636 ms).
Result: Download 1000 bytes @ 2162 kbps (3.699 ms).
Result: Download 1000 bytes @ 2130 kbps (3.755 ms).
Result: Download 1000 bytes @ 2096 kbps (3.816 ms).
Result: Download 1000 bytes @ 2065 kbps (3.874 ms).
With 500us fixed server delay:
Result: Download 1000 bytes @ 1015 kbps (7.875 ms).
Result: Download 1000 bytes @ 1000 kbps (7.996 ms).
Result: Download 1000 bytes @ 992 kbps (8.062 ms).
Result: Download 1000 bytes @ 985 kbps (8.121 ms).
Result: Download 1000 bytes @ 977 kbps (8.185 ms).
Result: Download 1000 bytes @ 970 kbps (8.245 ms).
Result: Download 1000 bytes @ 962 kbps (8.309 ms).
Result: Download 1000 bytes @ 956 kbps (8.367 ms).
Result: Download 1000 bytes @ 948 kbps (8.431 ms).
Result: Download 1000 bytes @ 942 kbps (8.492 ms).
With 500us variable server-side delay:
Result: Download 1000 bytes @ 368 kbps (21.682 ms).
Result: Download 1000 bytes @ 366 kbps (21.856 ms).
Result: Download 1000 bytes @ 364 kbps (21.939 ms).
Result: Download 1000 bytes @ 363 kbps (22.000 ms).
Result: Download 1000 bytes @ 362 kbps (22.057 ms).
Result: Download 1000 bytes @ 361 kbps (22.111 ms).
Result: Download 1000 bytes @ 360 kbps (22.169 ms).
Result: Download 1000 bytes @ 360 kbps (22.221 ms).
Result: Download 1000 bytes @ 359 kbps (22.278 ms).
Result: Download 1000 bytes @ 358 kbps (22.331 ms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra delay seems to depend on a few factors including the client cmd parameters and the variable delay. The test was run on an ordinary VM.
@@ -15,6 +15,8 @@ | |||
#include "PerfServer.cpp.clog.h" | |||
#endif | |||
|
|||
#include <random> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, since this code is cross-compiled for user mode and kernel mode, we cannot reference the standard library functions/headers in any file in the lib
directory,
@@ -71,6 +73,24 @@ PerfServer::Init( | |||
} | |||
} | |||
|
|||
_Benign_race_begin_ | |||
if (TryGetValue(argc, argv, "delay", &this->DelayMicroseconds)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, we have TryGetVariableUnitValue
in perfclient.cpp
that allows for the app to specify its own time unit, such as -delayType:2ms
or -delayType:100us
. I think it would be useful here.
@@ -71,6 +73,24 @@ PerfServer::Init( | |||
} | |||
} | |||
|
|||
_Benign_race_begin_ | |||
if (TryGetValue(argc, argv, "delay", &this->DelayMicroseconds)) { | |||
// todo: Impose a limit on delay? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say no. The goal would be to leverage this to mimic a production scenario's workload. If partner X takes 2 seconds to generate responses for requests, we shouldn't limit that just because we think it's too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing unexpected behavior with larger delays - likely due to where the delay is introduced. Got this part about having no upper limit on the delay.
if (TryGetValue(argc, argv, "delay", &this->DelayMicroseconds)) { | ||
// todo: Impose a limit on delay? | ||
const char* DelayTypeString = nullptr; | ||
this->DelayType = SYNTHETIC_DELAY_FIXED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: We don't use this->
in the msquic code base. Please just reference the variables directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not very productive to argue about style in an established codebase. My take is that code becomes hard to follow when local variables and class members are used within a method without an obvious way to distinguish between them. I also see the use of This
in this file, but it is not my preferred notation. I have used m_
or simply _
prefix on class members in the past and it makes it easy to follow (and some people hate that notation).
this->DelayType = SYNTHETIC_DELAY_FIXED; | ||
|
||
if (TryGetValue(argc, argv, "delayType", &DelayTypeString)) { | ||
if (DelayTypeString != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TryGetValue
only returns true if it outputs something, so there's no need to check for nullptr
.
{ | ||
uint64_t Start = CxPlatTimeUs64(); | ||
uint64_t Now = Start; | ||
WriteOutput("%d\n", DelayUs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this. We will want to run tests that might generate millions of requests, each with some fixed delay, and we don't want a console output for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just for a draft change. Will replace it with a different output subsequently.
do { | ||
|
||
// Do some stalling work | ||
InterlockedIncrement64(&this->WorkCounter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for this. Simply querying the time (which uses QPC on Windows) is enough work. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fenced access will synchronize the waiting threads and will create further contention down the line, making it a much richer simulation of the delays seen on real applications.
PerfServer::CalculateVariableDelay(double lambda) | ||
{ | ||
lambda = abs(lambda); | ||
std::mt19937 random_generator(CxPlatTimeUs32()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this won't compile for kernel mode. So, we need to either (a) remove it or (b) special case this to user mode only.
} | ||
|
||
double lambda = ((double)1) / DelayUs; | ||
// Mean value of VariableDelay is expected to be DelayUs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: The comment style in the msquic code base is:
//
// Mean value of VariableDelay is expected to be DelayUs
//
@@ -205,6 +298,7 @@ PerfServer::StreamCallback( | |||
break; | |||
case QUIC_STREAM_EVENT_SEND_COMPLETE: | |||
Context->OutstandingBytes -= ((QUIC_BUFFER*)Event->SEND_COMPLETE.ClientContext)->Length; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: revert
@@ -79,6 +79,10 @@ PrintHelp( | |||
" -port:<####> The UDP port of the server. Ignored if \"bind\" is passed. (def:%u)\n" | |||
" -serverid:<####> The ID of the server (used for load balancing).\n" | |||
" -cibir:<hex_bytes> A CIBIR well-known idenfitier.\n" | |||
" -delay:<microseconds> Optional delay in microseconds to be introduced before the server responds to a request.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the pattern we use for runtime
below? Let the app specify a unit.
@@ -79,6 +79,10 @@ PrintHelp( | |||
" -port:<####> The UDP port of the server. Ignored if \"bind\" is passed. (def:%u)\n" | |||
" -serverid:<####> The ID of the server (used for load balancing).\n" | |||
" -cibir:<hex_bytes> A CIBIR well-known idenfitier.\n" | |||
" -delay:<microseconds> Optional delay in microseconds to be introduced before the server responds to a request.\n" | |||
" -delayType:<fixed|variable> Optional ""delay type"" can be specified in conjunction with the delay argument.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" -delayType:<fixed|variable> Optional ""delay type"" can be specified in conjunction with the delay argument.\n" | |
" -delayType:<type> Optional delay type can be specified in conjunction with the 'delay' argument.\n" |
" ""fixed"" delay type introduces the specified delay before the server responds to any request and this is the default behavior.\n" | ||
" ""variable"" delay type introduces a statistical variability to the specified delay.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" ""fixed"" delay type introduces the specified delay before the server responds to any request and this is the default behavior.\n" | |
" ""variable"" delay type introduces a statistical variability to the specified delay.\n" | |
" 'fixed' - a constant time delay for each request.\n" | |
" 'variable' - a statistical variability to the specified delay.\n" |
void SimulateDelay(uint32_t DelayUs, SYNTHETIC_DELAY_TYPE DelayType); | ||
|
||
|
||
volatile int64_t WorkCounter = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary, since we can just use querying time as the delay's workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Querying time is an unfenced read, unlike an interlocked operation. The latter creates synchronization between different threads produces interesting pattern of delays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doing the minimal amount of work in a tight loop gives you the best accuracy to hit a specific delay, say 3us
. And IMHO, it's also simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will implement this version in my next iteration.
I still think the value in simulation is to create contention for a resource with the possibility of lining up threads for simultaneous completion. Perhaps async threads will give us the needed contention by design.
uint32_t DelayMicroseconds = 0; | ||
SYNTHETIC_DELAY_TYPE DelayType = SYNTHETIC_DELAY_FIXED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move up to with all the other variables.
This not quite the same as the feature request described in issue #1524, but it achieves some of the goals. The stream processing on the server side is already multithreaded and the use of exponential probability distribution to determine the delay for the current stream simulates variable processing time for a request on the server side.
We can introduce background thread processing as suggested in the issue description to simulate further processing delays on the server side to enable a richer simluation.
Description
Add the ability to introduce server-side delays into data streams in the secnetperf tool.
Testing
TBD
Do any existing tests cover this change? Are new tests needed?
Documentation
Documentation has been updated.