-
Notifications
You must be signed in to change notification settings - Fork 228
avr: implement __[u]divmod(h|q)i4 #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -43,6 +43,153 @@ intrinsics! { | |||
|
||||
((rem as u64) << 32) | (div as u64) | ||||
} | ||||
|
||||
#[naked] | ||||
pub unsafe extern "C" fn __udivmodqi4() { | ||||
// Returns unsigned 8-bit `n / d` and `n % d`. | ||||
// | ||||
// Note: GCC implements a [non-standard calling convention](https://gcc.gnu.org/wiki/avr-gcc#Exceptions_to_the_Calling_Convention) for this function. | ||||
// Derived from: https://github.com/gcc-mirror/gcc/blob/95f10974a9190e345776604480a2df0191104308/libgcc/config/avr/lib1funcs.S#L1365 | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These implementations derive from libgcc where the originating file is marked GPLv3+. IANAL and unsure of the implications from including it here. It would be interesting to attempt to port these functions to Rust. I think a similar implementation may be possible using some of the standard library's u8/u16 methods. I think that would require us to teach There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe instead of pulling GCC's impl we could simply use the naked function as a trampoline? i.e. like: #[naked]
pub unsafe extern "C" fn __divmodhi4() {
let args;
naked_asm!(/* move numbers from funny, arbitrary input registers into `args` */);
let out = /* call native-Rust implementation using `args` */;
naked_asm!(/* move `out` to funny, arbitrary output registers */)
} This would be less efficient and we have to be careful not to overwrite registers we shouldn't touch, but it should be good enough to get started. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I replied to this in another comment but unfortunately we can't take these as-is, ports are considered derivative work. Mentioned here #711 (comment) but adding a new ABI to rustc would be overkill for four functions in the whole ecosystem, maybe not possible based on LLVM IR limitations. You could probably use Rust as a start though, emit the optimized asm and manually adjust for the ABI. Or emit the optimized LLVM IR and adjust the signature to match the ABI if possible (this is effectively what compiler support in rustc would do), and compile that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted to go the trampoline route, but wasn't sure how to map the registers. I'm guessing it needs to move the input parameters into the expected registers, call a regular Rust function, then move the return arguments back to the special registers. All while save and restoring those trampled registers. It would be a start, but likely too much overhead for a long-term solution. I'll try implementing an algorithm in Rust and see what the optimizer spits out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to be very cautious with licensing here; the GCC sources are GPL but this crate needs to stay MIT or Apache-2.0, so unfortunately anything directly derived needs to be removed. If you write it by hand and get similar results that's fine, but these look pretty line-for-line. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed. That's why I wanted to be extraordinarily clear that these are lifted from gcc and likely problematic here. It's a starting point. Unfortunately, at the end of the day it's "just math", so re-coding would likely require us to use a different algorithm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it's necessary to come up with a different algorithm, as you say there are only so many ways to do small integer division. Our code doesn't need to be different from GCC just to say it's different and overlap is going to happen for these tiny routines, but the way to turn a given algorithm into assembly has to be done from scratch. I'm not an expert here though, @joshtriplett could probably clarify things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We definitely shouldn't have any routines taken directly from GCC, or written by someone who is working directly from the code in GCC as an example; that's extremely likely to make them effectively derived from GCC. Is there an implementation in LLVM or similar that we could draw from? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wikipedia has a nice entry on division algorithms. I implemented "long division" and after optimizing variable usage came up with pub fn long_div_u8(mut num_quo: u8, divisor: u8) -> (u8, u8) {
let mut remainder = 0u8;
for _ in 0..8u8 {
let overflow;
(num_quo, overflow) = num_quo.overflowing_add(num_quo);
(remainder, _) = remainder.carrying_add(remainder, overflow);
if let Some(rem) = remainder.checked_sub(divisor) {
remainder = rem;
num_quo += 1;
}
}
(num_quo, remainder)
} I thought the
The GCC assembly looks almost exactly like what I would expect the Rust code to generate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also tried implementing the "non-restoring division" but some of the math requires an extra bit that I couldn't get working in 8-bits. Here's what I've got thus far: pub fn non_restoring_u8(numerator: u8, divisor: u8) -> (u8, u8) {
const N: u8 = 8;
let m = divisor;
let mut a = 0u16; // <<=== would like to be u8
let mut q = numerator;
for _i in (0..N).rev() {
let over;
(q, over) = q.overflowing_add(q);
(a, _) = a.carrying_add(a, over);
if a & 0x200 != 0 {
a = a.wrapping_add(u16::from(m));
} else {
a = a.wrapping_sub(u16::from(m));
}
if a & 0x100 == 0 {
q |= 1;
}
}
if a & 0x100 != 0 {
a = a.wrapping_add(u16::from(m));
}
(q, u8::try_from(a).unwrap())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_non_restoring_u8() {
for n in 0..u8::MAX {
for d in 1..u8::MAX {
//println!("n: {n}, d: {d}");
assert_eq!(non_restoring_u8(n, d), (n / d, n % d));
}
}
}
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do have some generic algorithms in the crate, at https://github.com/rust-lang/compiler-builtins/blob/9978a8b06b7c1b53a6c503a2bfe7aea9ba6ca98b/compiler-builtins/src/int/specialized_div_rem/mod.rs and
Rustc doesn't do this kind of optimization, it would be on the LLVM end. If you can reproduce the problem in LLVM IR, it would be filing a missed optimization bug. You can just replace those couple of lines with inline asm. Unfortunately it's probably going to be pretty tough to get something that optimizes super well here without handwritten assembly. For this initial PR just doing a naked trampoline to a suboptimal Rust function would be fine, it can be further optimized later. We have a couple of generic |
||||
|
||||
// r25: remainder | ||||
// r24: dividend, quotient | ||||
// r22: divisor | ||||
// r23: loop counter | ||||
core::arch::naked_asm!( | ||||
"clr r25", // clear remainder | ||||
"ldi r23, 8", // init loop counter | ||||
"lsl r24", // shift dividend | ||||
"1:", | ||||
"rol r25", // shift dividend into remainder | ||||
"cp r25, r22", // compare remainder & divisor | ||||
"brcs 2f", // REMAINder <= divisor | ||||
"sub r25, r22", // restore remainder | ||||
"2:", | ||||
"rol r24", // shift dividend (with CARRY) | ||||
"dec r23", // decrement loop counter | ||||
"brne 1b", | ||||
"com r24", // complement result (C flag was complemented in loop) | ||||
"ret", | ||||
); | ||||
} | ||||
|
||||
#[naked] | ||||
pub unsafe extern "C" fn __divmodqi4() { | ||||
// Returns signed 8-bit `n / d` and `n % d`. | ||||
// | ||||
// Note: GCC implements a [non-standard calling convention](https://gcc.gnu.org/wiki/avr-gcc#Exceptions_to_the_Calling_Convention) for this function. | ||||
// Derived from: https://github.com/gcc-mirror/gcc/blob/95f10974a9190e345776604480a2df0191104308/libgcc/config/avr/lib1funcs.S#L1385 | ||||
|
||||
// r25: remainder | ||||
// r24: dividend, quotient | ||||
// r22: divisor | ||||
// r23: loop counter | ||||
core::arch::naked_asm!( | ||||
"bst r24, 7", // store sign of dividend | ||||
"mov r0, r24", | ||||
"eor r0, r22", // r0.7 is sign of result | ||||
"sbrc r24, 7", | ||||
"neg r24", // dividend negative : negate | ||||
"sbrc r22, 7", | ||||
"neg r22", // divisor negative : negate | ||||
// TODO: "call" => instruction requires a CPU feature not currently enabled | ||||
// TODO: gcc checks for __AVR_HAVE_JMP_CALL__ | ||||
"rcall __udivmodqi4", // do the unsigned div/mod | ||||
"brtc 1f", | ||||
"neg r25", // correct remainder sign | ||||
"1:", | ||||
"sbrc r0, 7", | ||||
"neg r24", // correct result sign | ||||
"ret", | ||||
); | ||||
} | ||||
|
||||
#[naked] | ||||
pub unsafe extern "C" fn __udivmodhi4() { | ||||
// Returns unsigned 16-bit `n / d` and `n % d`. | ||||
// | ||||
// Note: GCC implements a [non-standard calling convention](https://gcc.gnu.org/wiki/avr-gcc#Exceptions_to_the_Calling_Convention) for this function. | ||||
// Derived from: https://github.com/gcc-mirror/gcc/blob/95f10974a9190e345776604480a2df0191104308/libgcc/config/avr/lib1funcs.S#L1427 | ||||
|
||||
// r26: remainder (low) | ||||
// r27: remainder (high) | ||||
// r24: dividend (low) | ||||
// r25: dividend (high) | ||||
// r22: divisor (low) | ||||
// r23: divisor (high) | ||||
// r21: loop counter | ||||
core::arch::naked_asm!( | ||||
"sub r26, r26", | ||||
"sub r27, r27", // clear remainder and carry | ||||
"ldi r21, 17", // init loop counter | ||||
"rjmp 2f", // jump to entry point | ||||
"1:", | ||||
"rol r26", // shift dividend into remainder | ||||
"rol r27", | ||||
"cp r26, r22", // compare remainder & divisor | ||||
"cpc r27, r23", | ||||
"brcs 2f", // remainder < divisor | ||||
"sub r26, r22", // restore remainder | ||||
"sbc r27, r23", | ||||
"2:", | ||||
"rol r24", // shift dividend (with CARRY) | ||||
"rol r25", | ||||
"dec r21", // decrement loop counter | ||||
"brne 1b", | ||||
"com r24", | ||||
"com r25", | ||||
// TODO: "movw" => instruction requires a CPU feature not currently enabled | ||||
// TODO: gcc checks for __AVR_HAVE_MOVW__ | ||||
"mov r22, r24", | ||||
"mov r23, r25", | ||||
"mov r24, r26", | ||||
"mov r25, r27", | ||||
"ret", | ||||
); | ||||
} | ||||
|
||||
#[naked] | ||||
pub unsafe extern "C" fn __divmodhi4() { | ||||
// Returns signed 16-bit `n / d` and `n % d`. | ||||
// | ||||
// Note: GCC implements a [non-standard calling convention](https://gcc.gnu.org/wiki/avr-gcc#Exceptions_to_the_Calling_Convention) for this function. | ||||
// Derived from: https://github.com/gcc-mirror/gcc/blob/95f10974a9190e345776604480a2df0191104308/libgcc/config/avr/lib1funcs.S#L1457 | ||||
|
||||
// r26: remainder (low) | ||||
// r27: remainder (high) | ||||
// r24: dividend (low) | ||||
// r25: dividend (high) | ||||
// r22: divisor (low) | ||||
// r23: divisor (high) | ||||
// r21: loop counter | ||||
core::arch::naked_asm!( | ||||
"bst r25, 7", // store sign of dividend | ||||
"mov r0, r23", | ||||
"brtc 0f", | ||||
"com r0", // r0.7 is sign of result | ||||
"rcall 1f", // dividend negative : negate | ||||
"0:", | ||||
"sbrc r23, 7", | ||||
"rcall 2f", // divisor negative : negate | ||||
// TODO: "call" => instruction requires a CPU feature not currently enabled | ||||
// TODO: gcc checks for __AVR_HAVE_JMP_CALL__ | ||||
"rcall __udivmodhi4", // do the unsigned div/mod | ||||
"sbrc r0, 7", | ||||
"rcall 2f", // correct remainder sign | ||||
"brtc 3f", | ||||
"1:", | ||||
"com r25", // correct dividend/remainder sign | ||||
"neg r24", | ||||
"sbci r25, 0xFF", | ||||
"ret", | ||||
"2:", | ||||
"com r23", // correct divisor/result sign | ||||
"neg r22", | ||||
"sbci r23, 0xFF", | ||||
"3:", | ||||
"ret", | ||||
); | ||||
} | ||||
} | ||||
|
||||
intrinsics! { | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to meaningfully add tests as these intrinsics are specific to AVR. I have don't my best to compare the
--release
codegen with the GCC output, but haven't tried running the code on real hardware. I'm hoping @Patryk27 might be able to run them through his AVR math fuzzer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!