Skip to content

Optimize fill_bytes_via_* and prepare rand_core 0.1 #397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 15, 2018

Conversation

pitdicker
Copy link
Contributor

@pitdicker pitdicker commented Apr 13, 2018

Benchmark before:

test gen_bytes_xorshift    ... bench:   1,131,259 ns/iter (+/- 2,399) = 905 MB/s

After:

test gen_bytes_xorshift    ... bench:     284,238 ns/iter (+/- 906) = 3602 MB/s

I updated the version number, changelog and readme of rand_core for a 0.1.0 release. I hope #383 and #396 can me merged first. The date in the changelog is set to 2018-04-15, this Sunday. @dhardy Do you think that is realistic?

@pitdicker
Copy link
Contributor Author

Please ignore for now, I messed things up this morning.

};
left.copy_from_slice(&chunk[..n]);
if remainder > 0 {
left.copy_from_slice(&chunk[..remainder]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case chunk gets used twice (unless len == 0).

You also have three conditional jumps, so I'm not sure your code reduction attempt really pays off here?

@pitdicker
Copy link
Contributor Author

So I had multiple attempts to improve the performance of fill_via_u* lying around, and forgot which one it was that did the trick. And of course I took the one with the typo that was incorrect and because of that seemingly 4x faster...

Got the real one, which is less spectacular but still ~2,5x faster (on both x86 and x86_64).
It uses both next_u64 and next_u32, so I just renamed it to fill_via_next.

test gen_bytes_xorshift    ... bench:     391,250 ns/iter (+/- 27,374) = 2617 MB/s
test gen_u32_xorshift      ... bench:         860 ns/iter (+/- 36) = 4651 MB/s
test gen_u64_xorshift      ... bench:       1,722 ns/iter (+/- 273) = 4645 MB/s

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this version looks sensible!

I'm not quite sure about the rand-core release; either we just make a 0.1.0-pre.1 for now or we wait a few more days (preferably with separate PR).

l.copy_from_slice(&chunk);
}
let n = left.len();
if n >= 4 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n > 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not my day...

@pitdicker pitdicker force-pushed the rand_core_0.1 branch 3 times, most recently from 8273840 to 19e5187 Compare April 13, 2018 13:13
@pitdicker
Copy link
Contributor Author

I'm not quite sure about the rand-core release; either we just make a 0.1.0-pre.1 for now or we wait a few more days (preferably with separate PR).

I am a bit inpatient for a release to be honest. But I suppose it is also my fault as I keep making changes to rand_core...

I am not really a fan of a second pre-release, but will leave things to you. Should I separate out the version bump from this PR?

@dhardy
Copy link
Member

dhardy commented Apr 13, 2018

I seem to remember you were not a fan of the 0.5 pre-release either, yet we learned quite a bit since then...

Yes, separate the PR then we can merge the first bit and sit on the potential release a couple of days (maybe do that Monday).

@pitdicker
Copy link
Contributor Author

I seem to remember you were not a fan of the 0.5 pre-release either, yet we learned quite a bit since then...

True, I'll keep a bit more quite 😄.

But for rand_core I don't remember much more than #354 and #378, which both seem minor documentation issues to me.

@pitdicker
Copy link
Contributor Author

Ok, I reset the version to 0.1.0-pre.0.

@@ -50,6 +50,9 @@ Due to [rust-lang/cargo#1596](https://github.com/rust-lang/cargo/issues/1596),
unioned across the whole dependency tree, any crate using `rand` with its
default features will also enable `std` support in `rand_core`.

The `serde1` feature can be used to derive `Serialize` and `Deserialize` for RNG
implementations that use the `BockRng` or `BlockRng64` wrappers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Bock"!

CHANGELOG.md Outdated
- Revise the `SeedableRng` trait. (#233)
- Remove default implementations for `RngCore::next_u64` and `RngCore::fill_bytes`. (#288)
- Add `RngCore::try_fill_bytes`. (#225)
- Add `RngCore::bytes_per_round` optimization hint. (#396)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not merged yet (and I'm still not convinced about it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reasonable. I've removed this line from both changelogs, lets keep the for that just in #396.

@dhardy dhardy merged commit 979306d into rust-random:master Apr 15, 2018
@pitdicker pitdicker deleted the rand_core_0.1 branch April 15, 2018 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants