Skip to content

ARMv8 Cryptography Extensions support #250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2021
Merged

ARMv8 Cryptography Extensions support #250

merged 1 commit into from
May 17, 2021

Conversation

tarcieri
Copy link
Member

@tarcieri tarcieri commented May 14, 2021

Adds a new backend which uses ARMv8 Cryptography Extensions. These are currently unstable so support is gated under a newly added armv8 crate feature.

These extensions are supported on both 32-bit and 64-bit ARM targets, however the current implementation is gated on aarch64 (as that's the only architecture it's been tested on so far).

  • AES-128
  • AES-192
  • AES-256
  • Check handling of non-aligned inputs

Closes #10.

@tarcieri tarcieri requested a review from newpavlov May 14, 2021 04:27
@tarcieri tarcieri force-pushed the aes/armv8 branch 3 times, most recently from a1e14b5 to b393f8b Compare May 14, 2021 04:40
aes/src/armv8.rs Outdated

/// AES key expansion
#[inline]
pub fn expand_key<const N: usize>(key: &[u8; 16]) -> [[u8; 16]; N] {
Copy link
Member Author

@tarcieri tarcieri May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the implementation is using const generics. It's only using 1.51+ compatible features so far, and it's an internal implementation details, so I figured why not.

For comparison, the corresponding AES-NI implementation contains a lot of code duplication.

@tarcieri
Copy link
Member Author

Tests are confirmed passing on an Apple M1

@tarcieri tarcieri force-pushed the aes/armv8 branch 14 times, most recently from 0a8b89f to 15c5f1b Compare May 14, 2021 21:54
@tarcieri
Copy link
Member Author

tarcieri commented May 14, 2021

Some preliminary benchmarks on a M1 Mac Mini:

soft backend

test aes128_decrypt  ... bench:         268 ns/iter (+/- 2) = 59 MB/s
test aes128_decrypt8 ... bench:         523 ns/iter (+/- 5) = 244 MB/s
test aes128_encrypt  ... bench:         262 ns/iter (+/- 2) = 61 MB/s
test aes128_encrypt8 ... bench:         513 ns/iter (+/- 10) = 249 MB/s
test aes192_decrypt  ... bench:         311 ns/iter (+/- 1) = 51 MB/s
test aes192_decrypt8 ... bench:         611 ns/iter (+/- 4) = 209 MB/s
test aes192_encrypt  ... bench:         308 ns/iter (+/- 2) = 51 MB/s
test aes192_encrypt8 ... bench:         603 ns/iter (+/- 9) = 212 MB/s
test aes256_decrypt  ... bench:         357 ns/iter (+/- 1) = 44 MB/s
test aes256_decrypt8 ... bench:         704 ns/iter (+/- 5) = 181 MB/s
test aes256_encrypt  ... bench:         350 ns/iter (+/- 1) = 45 MB/s
test aes256_encrypt8 ... bench:         689 ns/iter (+/- 8) = 185 MB/s

ARMv8 intrinsics

test aes128_decrypt  ... bench:          12 ns/iter (+/- 0) = 1333 MB/s
test aes128_decrypt8 ... bench:          32 ns/iter (+/- 0) = 4000 MB/s
test aes128_encrypt  ... bench:          12 ns/iter (+/- 0) = 1333 MB/s
test aes128_encrypt8 ... bench:          31 ns/iter (+/- 0) = 4129 MB/s
test aes192_decrypt  ... bench:          13 ns/iter (+/- 0) = 1230 MB/s
test aes192_decrypt8 ... bench:          21 ns/iter (+/- 0) = 6095 MB/s
test aes192_encrypt  ... bench:          13 ns/iter (+/- 0) = 1230 MB/s
test aes192_encrypt8 ... bench:          22 ns/iter (+/- 0) = 5818 MB/s
test aes256_decrypt  ... bench:          16 ns/iter (+/- 1) = 1000 MB/s
test aes256_decrypt8 ... bench:          20 ns/iter (+/- 0) = 6400 MB/s
test aes256_encrypt  ... bench:          16 ns/iter (+/- 0) = 1000 MB/s
test aes256_encrypt8 ... bench:          20 ns/iter (+/- 0) = 6400 MB/s

@tarcieri tarcieri force-pushed the aes/armv8 branch 5 times, most recently from 88f2a98 to 3eb1e34 Compare May 15, 2021 14:57
@tarcieri tarcieri force-pushed the aes/armv8 branch 2 times, most recently from 465cca1 to fc260d0 Compare May 15, 2021 18:11
@tarcieri tarcieri force-pushed the aes/armv8 branch 4 times, most recently from b3ec618 to e5bcf77 Compare May 15, 2021 20:49
@tarcieri tarcieri changed the title [WIP] ARMv8 Cryptography Extensions support ARMv8 Cryptography Extensions support May 15, 2021
@tarcieri tarcieri marked this pull request as ready for review May 15, 2021 20:58
@tarcieri
Copy link
Member Author

Removing WIP.

I'd call this complete except for pipelining. It implements the following:

  • AES-128/AES-192/AES-256 encryption/decryption
  • Runtime CPU feature detection on Linux and macOS targets
  • FIPS 197 test vectors for the 128/192/256-bit key schedules

I will look at pipelining, with an eye on what improves performance on the M1 (since that's the most powerful ARMv8 I have access to).

In the meantime I would love it if anyone could benchmark it on other 64-bit ARMv8 platforms. I'll leave this PR open for awhile to invite review.

@tarcieri tarcieri force-pushed the aes/armv8 branch 2 times, most recently from 0e53607 to 9f81b5c Compare May 17, 2021 15:48
@tarcieri
Copy link
Member Author

Implemented pipelining which operates 8-blocks-at-a-time. Saw some pretty nice performance gains on the Apple M1 (reaching nearly 10GB/sec on AES-128!)

test aes128_decrypt  ... bench:          12 ns/iter (+/- 0) = 1333 MB/s
test aes128_decrypt8 ... bench:          13 ns/iter (+/- 0) = 9846 MB/s
test aes128_encrypt  ... bench:          12 ns/iter (+/- 0) = 1333 MB/s
test aes128_encrypt8 ... bench:          13 ns/iter (+/- 0) = 9846 MB/s
test aes192_decrypt  ... bench:          14 ns/iter (+/- 0) = 1142 MB/s
test aes192_decrypt8 ... bench:          15 ns/iter (+/- 0) = 8533 MB/s
test aes192_encrypt  ... bench:          14 ns/iter (+/- 0) = 1142 MB/s
test aes192_encrypt8 ... bench:          15 ns/iter (+/- 0) = 8533 MB/s
test aes256_decrypt  ... bench:          16 ns/iter (+/- 0) = 1000 MB/s
test aes256_decrypt8 ... bench:          17 ns/iter (+/- 0) = 7529 MB/s
test aes256_encrypt  ... bench:          16 ns/iter (+/- 0) = 1000 MB/s
test aes256_encrypt8 ... bench:          17 ns/iter (+/- 0) = 7529 MB/s

Adds a new nightly-only backend which uses ARMv8 Cryptography Extensions
gated under the newly introduced `armv8` crate feature.

Support is provided for AES-128, AES-192, and AES-256, with runtime CPU
feature detection on Linux and macOS targets.

These extensions are supported on both 32-bit and 64-bit ARM targets,
however the current implementation is gated on `aarch64` (as that's
the only architecture it's been tested on so far). However, it could be
easily extended to 32-bit ARMv8 targets as well.
@tarcieri
Copy link
Member Author

Going to go ahead and land this. At this point I'd say it's the best tested of all of the backends.

@tarcieri tarcieri merged commit 8569b1c into master May 17, 2021
@tarcieri tarcieri deleted the aes/armv8 branch May 17, 2021 16:19
This was referenced May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hardware accelerated AES for ARM
1 participant