-
Notifications
You must be signed in to change notification settings - Fork 43
Fix some tests #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some tests #16
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @steveklabnik! Welcome, and thanks for taking a look. I've got some comments below. Let me know if you want to chat about it.
So, one additional wrinkle here: this doesn't actually get the tests passing yet. There was actually a panic while drop, which caused an abort. I decided to check that out this morning, and it's https://github.com/oxidecomputer/oxide-api-prototype/blob/57ee07e59c28a9dd025b1ed511a001b1af6479fe/dropshot/src/logging.rs#L400, which causes a panic while a panic is occuring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again. Capturing some points below from private discussion. Sorry for the mess here!
Pushing some partial work here; I still have some changes to make around the original tests, but I did some work to get better error handling in place. That also revealed a few other test failures. Stuff left to do:
for the first checkbox, these pass locally but fail in CI with
This is the exact platform divergence that caused the issues in the first place. I'm gonna knock out this change in the morning for a nice quick win :) |
I've posted a question to the Rust user's forum asking for advice about the TOML and paths https://users.rust-lang.org/t/escaping-windows-paths-for-toml/44432 |
Implemented the solution from the users's thread. @davepacheco big remaining question here is the time not moving forward. I am finding that very confusing, though I haven't dug into details yet. I'm about to eat some lunch and walk my dog, but if you have any hypotheses here, that would be quite helpful! |
Hmm. The background here (possibly obvious, but just so it's written down) is that I opted for These would be my next questions:
I'm happy to help. |
I am glad you wrote that out; this is what I assumed, but I wasn't totally sure.
So, as a first run:
The debug representation of a DateTime involves the native local time + the offset. Without the offset:
So yeah, I'm seeing not round numbers in either timestamp.
I ran sets of 6, in debug mode:
Some interesting weirdness here: why did that first run have no 00s? Regardless, most of them have different ending timestamps, except the very last two there. Regardless, I see the test failing every time, and this is only an occasional issue when I'm printing stuff out like this, so I doubt that's the issue. Here's the impl of UTC::now:
So, looks like Windows is only granular to 100-nanoseconds. Which was one of your guesses! To triple check, I changed my code, and ran it in release this time: fn main() {
let a = chrono::Utc::now();
let b = chrono::Utc::now();
dbg!(dbg!(a) == dbg!(b));
let a = chrono::Utc::now();
let b = chrono::Utc::now();
dbg!(dbg!(a) == dbg!(b));
let a = chrono::Utc::now();
let b = chrono::Utc::now();
dbg!(dbg!(a) == dbg!(b));
let a = chrono::Utc::now();
let b = chrono::Utc::now();
dbg!(dbg!(a) == dbg!(b));
let a = chrono::Utc::now();
let b = chrono::Utc::now();
dbg!(dbg!(a) == dbg!(b));
} (this dbg stuff is gross but that's part of the fun)
so yeah. That is what's going on, it seems. And indeed, changing the three So, I guess the question is: how important is that not equals? Side note:
You may have seen this, you may have not, but I like to share this really epic comment from the standard library: https://github.com/rust-lang/rust/blob/2935d294ff862fdf96578d0cbbdc289e8e7ba81c/src/libstd/time.rs#L205-L232 |
(I pushed a commit with the three |
...
Bummer. Also, if I'm reading that output right, all of those timestamps in your first block of output are intervals of 1000 nanoseconds, which is pretty round for a supposedly nanosecond timestamp. As for the first one from your test program
Yeah, that's pretty compelling.
The underlying question is how can know that timestamps generated by the system are correct? I think that's pretty hard to answer. Thinking out loud, here are some properties I would expect to be true of our system:
Here are some realistic ways I could imagine the timestamps becoming broken:
We'd catch (1) because we compare timestamps reported by the server to timestamps generated in the test suite using Assuming we still want to try to catch these (and I'd like to), two options I see are to use a higher resolution timestamp or to have the test suite pause for N nanoseconds (N being whatever the minimum timestamp resolution is) before any operation whose timestamp it's going to check. That'd be unfortunate for a few reasons. Is there any way in Rust to get a higher resolution timestamp from Windows?
Heh, yeah, I remember reading that one. That's for monotonic time, which really, really shouldn't ever go backwards, and it's disappointing on how many systems you found that happening! It's usually wrong to rely on wall time never going backwards like we do here, but in this case I think it's necessary to test what we want to test, and the cases where that's invalid aren't that big a deal here. If you happen to adjust your clock while running the test suite, you just have to run it again. Thanks for digging into this! |
Hmm. It looks like only a recent change that lets us get 100ns resolution. (@pfmooney was here!) |
Based on this article and the fact that the 100-ns resolution function is already called "precise", I'm starting to worry that Windows doesn't have a way to get a more precise wall timestamp, in which case we'd have to go with the other approach. Maybe we don't need to do it everywhere we check a timestamp, but we could pause for, say, 200ns between a couple of operations, use |
So, I think the first question is "what does Windows want you to do?" From the doc I linked above:
So, seems like adding |
Based on the PR I linked to, I think it's already using |
That PR wasn't merged, so my reading of it was the opposite. Regardless, it seems like updating upstream is gonna be way more work than makes sense to do here, so
This seems great, and I'll implement that 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of small feedback below. Let me know if I can help more.
src/oxide_controller/config.rs
Outdated
|
||
impl fmt::Display for LoadError { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
// default to debug for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new enum, but how about keeping the error messages more aimed at operators (i.e., not Rust debug output, but "read "$file_name": $the_underlying_error")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds great, i had done this purely as a placeholder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit that implements this. let me know what you think; i feel 50/50 about it to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a commit to address some of these comments, will take care of the others in the morning :)
and gotta clean up this git history, whew
src/oxide_controller/config.rs
Outdated
|
||
impl fmt::Display for LoadError { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
// default to debug for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds great, i had done this purely as a placeholder
550e72c
to
8d636e3
Compare
Okay, so I've cleaned this up into two commits: one with all of the main work, and one with just the error handling change, to print out the path. Hopefully that will make it a bit easier to review just those error changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! I just had a few nits -- up to you on these.
0d345a7
to
b86e422
Compare
Great! This should be all good then. I don't know how strong commit discipline is here; this is one commit, but I could also see the argument for breaking it up into its four constituent changes. If you'd like me to do that, let me know, otherwise, this should be good to go! |
For reference, we've been squashing and landing, though we haven't codified that yet. |
Cool, then unless you have anything else for me, this can be merged! :D Thanks again for all the help in review; getting into a new codebase always takes a bit of time. |
There was this last thing: #16 (comment) (It's totally fine to skip it but it sounded like you intended to change it.) |
This commit fixes the test suite on Windows, but in order to do so in a reasonable manner, makes some additional changes: * Turns some String errors into more structured errors * Escapes paths properly on Windows; this was the main issue with the suite failing * moves Utc::now inline, so it's more clear that no caching is happening * Sleeps between actions on some tests on Windows; Chrono only supports times in 100-nanosecond granularity, and so we need to ensure enough time passes to observe the forward motion of time.
These tests would fail on Windows, because Windows paths contain \s,
which are interpreted by TOML as Unicode escapes. This commit fixes that
issue, as well as removes the dependence on the specific text of the
error. These strings are platform dependent, and so these tests will be
very fragile. Because the errors are only Strings anyway, I changed the
tests to verify an Err was returned, but not which one. A more robust
test would require a more robust error type, and I wasn't sure that was
worth actually doing yet.