Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart Server on Each NetPerf Test Run #4050

Merged
merged 56 commits into from
Jan 16, 2024
Merged

Conversation

nibanks
Copy link
Member

@nibanks nibanks commented Jan 12, 2024

Description

Improvements to the NetPerf automation to restart the server for each test, but also includes several other fixes found along the way:

  • Refactoring the helper to have a "run" of a test to run the server only for the time it takes to do the couple of tries for that test.
  • Fix log collections (client and server).
  • Collect crash dumps (Windows only for now) on client and server.
  • Fix a bug in the perf server code that wasn't passing the context pointer in for a callback.
  • Improvements (but not a complete fix yet) of the TCP cleanup path to handle some race conditions.

Testing

NetPerf Automation

Documentation

N/A

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f0b34fe) 87.10% compared to head (7cedf5d) 86.03%.
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4050      +/-   ##
==========================================
- Coverage   87.10%   86.03%   -1.07%     
==========================================
  Files          56       56              
  Lines       16938    16938              
==========================================
- Hits        14753    14573     -180     
- Misses       2185     2365     +180     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nibanks nibanks marked this pull request as ready for review January 15, 2024 18:59
@nibanks nibanks requested a review from a team as a code owner January 15, 2024 18:59
@nibanks nibanks added the Bug: Test/Tool Code bug in the test or tool specific code. label Jan 16, 2024
param ($Session)
if ($isWindows) {
Invoke-Command -Session $Session -ScriptBlock {
$DumpDir = "C:/_work/quic/artifacts/crashdumps"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this common to create on top directory? how about using working directory? or C:/_work/quic is already commonly used for any purpose by msquic perf test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "common" working directory for our tests. It's used throughout.

param ($Session, $OutputDir)
if ($isWindows) {
$DumpFiles = Invoke-Command -Session $Session -ScriptBlock {
Get-ChildItem "C:/_work/quic/artifacts/crashdumps" | Where-Object { $_.Extension -eq ".dmp" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use globally defined const name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for our perf machines, this is a fixed path.

Comment on lines +158 to +162
$metric = "throughput-download"
if ($exeArgs.Contains("plat:1")) {
$metric = "latency"
} elseif ($exeArgs.Contains("prate:1")) {
$metric = "hps"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these "plat" and "prate" associated with each metric name? Looks really high context naming

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comment above, this stuff is fragile and needs to be improved in future PRs.

static CXPLAT_THREAD_CALLBACK(WatchdogThreadCallback, Context) {
auto This = (CxPlatWatchdog*)Context;
if (!This->ShutdownEvent.WaitTimeout(This->TimeoutMs)) {
#ifndef _KERNEL_MODE // Not supported in kernel mode
if (This->WriteToConsole) {
printf("Error: Watchdog timeout fired!\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If error, how about fprintf(stderr, ) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing to error stream causes other complexity at the PowerShell level if you decide you want to ignore certain errors (temporarily) like we are doing currently. So I prefer to keep it this way.

@nibanks nibanks merged commit b0edc58 into main Jan 16, 2024
359 of 368 checks passed
@nibanks nibanks deleted the nibanks/netperf-restart-server branch January 16, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Automation Area: Performance Bug: Test/Tool Code bug in the test or tool specific code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants