Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "delay" and "delayType" options to secnetperf tool #4807

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sm-msft
Copy link
Contributor

@sm-msft sm-msft commented Feb 12, 2025

This not quite the same as the feature request described in issue #1524, but it achieves some of the goals. The stream processing on the server side is already multithreaded and the use of exponential probability distribution to determine the delay for the current stream simulates variable processing time for a request on the server side.

We can introduce background thread processing as suggested in the issue description to simulate further processing delays on the server side to enable a richer simluation.

Description

Add the ability to introduce server-side delays into data streams in the secnetperf tool.

  • Default/fixed delay introduces specific microseconds of inline busy-wait delay when the stream is established
  • Variable delay introduces an exponetially distributed random busy-wait delay. This is capped at 1millisecond inline sleep.
  • Documentation/help text updated

Testing

TBD
Do any existing tests cover this change? Are new tests needed?

Documentation

Documentation has been updated.

- Default/fixed delay introduces specific microseconds of inline busy-wait delay when the stream is established
- Variable delay introduces an exponetially distributed random busy-wait delay. This is capped at 1millisecond inline sleep.
- Documentation/help text updated
@sm-msft sm-msft assigned sm-msft and nibanks and unassigned sm-msft Feb 12, 2025
Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.88%. Comparing base (8936fdc) to head (c20f38a).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4807      +/-   ##
==========================================
- Coverage   87.02%   86.88%   -0.15%     
==========================================
  Files          56       56              
  Lines       17422    17422              
==========================================
- Hits        15162    15137      -25     
- Misses       2260     2285      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/perf/lib/PerfServer.cpp Show resolved Hide resolved
@@ -71,6 +73,24 @@ PerfServer::Init(
}
}

_Benign_race_begin_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no real race possible, right? This is only the analysis being confused because it can't know initialization is always complete before the stream starts?
_No_competing_thread_ might be more adapted.

Copy link
Contributor Author

@sm-msft sm-msft Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although initialization is run only once, I am not sure if we are enforcing it. There are competing threads reading the variable and potential competing thread that can write to it. It should not matter in either case, hence the benign race tagging. I am not sure if this will help or hurt the readability here but just keeping it around while I work on developing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove all these. We do guarantee all initialization happens before all the callbacks below.

src/perf/lib/PerfServer.cpp Show resolved Hide resolved
@@ -29,6 +29,8 @@ ecn | `-ecn:<0,1>` | Enables sender-side ECN support.
exec | `-exec:<lowlat,maxtput,scavenger,realtime>` | The execution profile used for the application.
pollidle | `-pollidle:<time_us>` | The time, in microseconds, to poll while idle before sleeping (falling back to interrupt-driven IO).
stats | `-stats:<0,1>` | Prints out statistics at the end of each connection.
delay | `[-delay:<microseconds>]` | Optional delay in microseconds to be introduced before the server responds to a request.
delayType | `[-delayType:<fixed|variable>]` | Optional "delay type" can be specified in conjunction with the delay argument. "fixed" delay type introduces the specified delay before the server responds to any request and this is the default behavior. "variable" delay type introduces a statistical variability to the specified delay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample run with cmd .\secnetperf.exe -target:127.0.0.1 -down:1KB -ptput:1 -streams:10
Without server delays:
Result: Download 1000 bytes @ 2569 kbps (3.113 ms).
Result: Download 1000 bytes @ 2457 kbps (3.255 ms).
Result: Download 1000 bytes @ 2399 kbps (3.334 ms).
Result: Download 1000 bytes @ 2316 kbps (3.453 ms).
Result: Download 1000 bytes @ 2251 kbps (3.553 ms).
Result: Download 1000 bytes @ 2200 kbps (3.636 ms).
Result: Download 1000 bytes @ 2162 kbps (3.699 ms).
Result: Download 1000 bytes @ 2130 kbps (3.755 ms).
Result: Download 1000 bytes @ 2096 kbps (3.816 ms).
Result: Download 1000 bytes @ 2065 kbps (3.874 ms).

With 500us fixed server delay:
Result: Download 1000 bytes @ 1015 kbps (7.875 ms).
Result: Download 1000 bytes @ 1000 kbps (7.996 ms).
Result: Download 1000 bytes @ 992 kbps (8.062 ms).
Result: Download 1000 bytes @ 985 kbps (8.121 ms).
Result: Download 1000 bytes @ 977 kbps (8.185 ms).
Result: Download 1000 bytes @ 970 kbps (8.245 ms).
Result: Download 1000 bytes @ 962 kbps (8.309 ms).
Result: Download 1000 bytes @ 956 kbps (8.367 ms).
Result: Download 1000 bytes @ 948 kbps (8.431 ms).
Result: Download 1000 bytes @ 942 kbps (8.492 ms).

With 500us variable server-side delay:
Result: Download 1000 bytes @ 368 kbps (21.682 ms).
Result: Download 1000 bytes @ 366 kbps (21.856 ms).
Result: Download 1000 bytes @ 364 kbps (21.939 ms).
Result: Download 1000 bytes @ 363 kbps (22.000 ms).
Result: Download 1000 bytes @ 362 kbps (22.057 ms).
Result: Download 1000 bytes @ 361 kbps (22.111 ms).
Result: Download 1000 bytes @ 360 kbps (22.169 ms).
Result: Download 1000 bytes @ 360 kbps (22.221 ms).
Result: Download 1000 bytes @ 359 kbps (22.278 ms).
Result: Download 1000 bytes @ 358 kbps (22.331 ms).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra delay seems to depend on a few factors including the client cmd parameters and the variable delay. The test was run on an ordinary VM.

@@ -15,6 +15,8 @@
#include "PerfServer.cpp.clog.h"
#endif

#include <random>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, since this code is cross-compiled for user mode and kernel mode, we cannot reference the standard library functions/headers in any file in the lib directory,

@@ -71,6 +73,24 @@ PerfServer::Init(
}
}

_Benign_race_begin_
if (TryGetValue(argc, argv, "delay", &this->DelayMicroseconds)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we have TryGetVariableUnitValue in perfclient.cpp that allows for the app to specify its own time unit, such as -delayType:2ms or -delayType:100us. I think it would be useful here.

@@ -71,6 +73,24 @@ PerfServer::Init(
}
}

_Benign_race_begin_
if (TryGetValue(argc, argv, "delay", &this->DelayMicroseconds)) {
// todo: Impose a limit on delay?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say no. The goal would be to leverage this to mimic a production scenario's workload. If partner X takes 2 seconds to generate responses for requests, we shouldn't limit that just because we think it's too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was seeing unexpected behavior with larger delays - likely due to where the delay is introduced. Got this part about having no upper limit on the delay.

if (TryGetValue(argc, argv, "delay", &this->DelayMicroseconds)) {
// todo: Impose a limit on delay?
const char* DelayTypeString = nullptr;
this->DelayType = SYNTHETIC_DELAY_FIXED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: We don't use this-> in the msquic code base. Please just reference the variables directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not very productive to argue about style in an established codebase. My take is that code becomes hard to follow when local variables and class members are used within a method without an obvious way to distinguish between them. I also see the use of This in this file, but it is not my preferred notation. I have used m_ or simply _ prefix on class members in the past and it makes it easy to follow (and some people hate that notation).

this->DelayType = SYNTHETIC_DELAY_FIXED;

if (TryGetValue(argc, argv, "delayType", &DelayTypeString)) {
if (DelayTypeString != nullptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryGetValue only returns true if it outputs something, so there's no need to check for nullptr.

{
uint64_t Start = CxPlatTimeUs64();
uint64_t Now = Start;
WriteOutput("%d\n", DelayUs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this. We will want to run tests that might generate millions of requests, each with some fixed delay, and we don't want a console output for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just for a draft change. Will replace it with a different output subsequently.

do {

// Do some stalling work
InterlockedIncrement64(&this->WorkCounter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this. Simply querying the time (which uses QPC on Windows) is enough work. 😄

Copy link
Contributor Author

@sm-msft sm-msft Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fenced access will synchronize the waiting threads and will create further contention down the line, making it a much richer simulation of the delays seen on real applications.

PerfServer::CalculateVariableDelay(double lambda)
{
lambda = abs(lambda);
std::mt19937 random_generator(CxPlatTimeUs32());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this won't compile for kernel mode. So, we need to either (a) remove it or (b) special case this to user mode only.

}

double lambda = ((double)1) / DelayUs;
// Mean value of VariableDelay is expected to be DelayUs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: The comment style in the msquic code base is:

//
// Mean value of VariableDelay is expected to be DelayUs
//

@@ -205,6 +298,7 @@ PerfServer::StreamCallback(
break;
case QUIC_STREAM_EVENT_SEND_COMPLETE:
Context->OutstandingBytes -= ((QUIC_BUFFER*)Event->SEND_COMPLETE.ClientContext)->Length;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revert

@@ -79,6 +79,10 @@ PrintHelp(
" -port:<####> The UDP port of the server. Ignored if \"bind\" is passed. (def:%u)\n"
" -serverid:<####> The ID of the server (used for load balancing).\n"
" -cibir:<hex_bytes> A CIBIR well-known idenfitier.\n"
" -delay:<microseconds> Optional delay in microseconds to be introduced before the server responds to a request.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the pattern we use for runtime below? Let the app specify a unit.

@@ -79,6 +79,10 @@ PrintHelp(
" -port:<####> The UDP port of the server. Ignored if \"bind\" is passed. (def:%u)\n"
" -serverid:<####> The ID of the server (used for load balancing).\n"
" -cibir:<hex_bytes> A CIBIR well-known idenfitier.\n"
" -delay:<microseconds> Optional delay in microseconds to be introduced before the server responds to a request.\n"
" -delayType:<fixed|variable> Optional ""delay type"" can be specified in conjunction with the delay argument.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" -delayType:<fixed|variable> Optional ""delay type"" can be specified in conjunction with the delay argument.\n"
" -delayType:<type> Optional delay type can be specified in conjunction with the 'delay' argument.\n"

Comment on lines +84 to +85
" ""fixed"" delay type introduces the specified delay before the server responds to any request and this is the default behavior.\n"
" ""variable"" delay type introduces a statistical variability to the specified delay.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" ""fixed"" delay type introduces the specified delay before the server responds to any request and this is the default behavior.\n"
" ""variable"" delay type introduces a statistical variability to the specified delay.\n"
" 'fixed' - a constant time delay for each request.\n"
" 'variable' - a statistical variability to the specified delay.\n"

void SimulateDelay(uint32_t DelayUs, SYNTHETIC_DELAY_TYPE DelayType);


volatile int64_t WorkCounter = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary, since we can just use querying time as the delay's workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying time is an unfenced read, unlike an interlocked operation. The latter creates synchronization between different threads produces interesting pattern of delays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doing the minimal amount of work in a tight loop gives you the best accuracy to hit a specific delay, say 3us. And IMHO, it's also simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will implement this version in my next iteration.

I still think the value in simulation is to create contention for a resource with the possibility of lining up threads for simultaneous completion. Perhaps async threads will give us the needed contention by design.

Comment on lines +184 to +185
uint32_t DelayMicroseconds = 0;
SYNTHETIC_DELAY_TYPE DelayType = SYNTHETIC_DELAY_FIXED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move up to with all the other variables.

@nibanks nibanks changed the title - Add "delay" and "delayType" options to secnetperf tool Add "delay" and "delayType" options to secnetperf tool Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants