Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better instruction sequence for CRC calculations #7

Open
Domkeykong opened this issue May 22, 2023 · 3 comments
Open

Better instruction sequence for CRC calculations #7

Domkeykong opened this issue May 22, 2023 · 3 comments

Comments

@Domkeykong
Copy link

General CRC computation

I have figured out a really good way to do CRC in 6 fully compressable instructions!

// Register Mappings:
//        a0 = current bit value
//        a3 = running CRC
//        a4 = CRCPOLY
// Clobbers: a0
        c.xor  a0, a3
        c.slli a0, 31 // LSB -> MSB
        c.srai a0, 31 // Copy MSB into all other bits
        c.and  a0, a4
        c.srli a3, 1
        c.xor  a3, a0

The current way assumes that the bit value is bitwise negated, which means it doesnt work for sending data

#define HANDLE_CRC \
c.xor a0, a3; \
c.andi a0, 1; \
c.addi a0, -1; \
and a0, a0, t0; \
c.srli a3, 1; \
c.xor a3, a0

// Handle CRC.
c.xor a3, a2;
c.andi a3, 1;
c.addi a3, -1;
and a3, a3, t0;
c.srli a2, 1;
c.xor a2, a0

Bit specific CRC computation

I also created Instruction sequences for when we already know which bit value we are currently handling.
This takes only 5 Instructions but has the penalty of using one large instruction in the beginning.

// Register Mappings:
//        a0 = Temp
//        a3 = running CRC
//        a4 = CRCPOLY
// Clobbers: a0
do1_crc:
	andi   a0, a3, 1 
	c.srli a3, 1
	c.addi a0, -1
	c.andi a0, a4
	c.xor  a3, a0

This one removes the need of the neg instruction being able to compress 1 more instruction

do0_crc:
	slli a0,a3,31 // Put a3s LSB into a0s MSB
	c.srai a0,31    // Copy MSB into all other bits
	c.srli a3,1
	c.andi a0,a4
	c.xor  a3,a0

// Handle CRC
andi a0, a3, 1
neg a0, a0
c.and a0, a4
c.srli a3, 1
c.xor a3, a0

The main trick i used hre is to shift left and then shift right arithmetic to copy the LSB to all into all the other places

@duk-37
Copy link
Contributor

duk-37 commented May 23, 2023

the sign-preserving right shift trick for the zero bit case should help here, thanks! the generic case is also useful but probably less so given the send logic already specializes for zero- and one- bit cases. also that crc code is misplaced/breaking stuff at the moment anyways

crc1 here is equivalent to what we already have but with
the shift reordered

@cnlohr
Copy link
Owner

cnlohr commented May 23, 2023

@duk-37 any chance you would be interested in reworking some of the assembly once I get a fully working stack? I don't think I want to stake stream time to further optimize things, but it would be fun to do before a supercut.

@duk-37
Copy link
Contributor

duk-37 commented May 23, 2023

@duk-37 any chance you would be interested in reworking some of the assembly once I get a fully working stack? I don't think I want to stake stream time to further optimize things, but it would be fun to do before a supercut.

Sure, I can take a look! Will also be a lot easier once we know more about the chip internals (#5 and Macyler's work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants