Better instruction sequence for CRC calculations #7

Domkeykong · 2023-05-22T23:48:26Z

General CRC computation

I have figured out a really good way to do CRC in 6 fully compressable instructions!

// Register Mappings:
//        a0 = current bit value
//        a3 = running CRC
//        a4 = CRCPOLY
// Clobbers: a0
        c.xor  a0, a3
        c.slli a0, 31 // LSB -> MSB
        c.srai a0, 31 // Copy MSB into all other bits
        c.and  a0, a4
        c.srli a3, 1
        c.xor  a3, a0

The current way assumes that the bit value is bitwise negated, which means it doesnt work for sending data

rv003usb/firmware/rv003usb.S

Lines 207 to 213 in 4cfe820

    
           #define HANDLE_CRC \ 
        
           	c.xor a0, a3; \ 
        
           	c.andi a0, 1; \ 
        
           	c.addi a0, -1; \ 
        
           	and a0, a0, t0; \ 
        
           	c.srli a3, 1; \ 
        
           	c.xor a3, a0

rv003usb/firmware/rv003usb.S

Lines 460 to 466 in 4cfe820

    
           // Handle CRC. 
        
           c.xor a3, a2; 
        
           c.andi a3, 1; 
        
           c.addi a3, -1; 
        
           and a3, a3, t0; 
        
           c.srli a2, 1; 
        
           c.xor a2, a0

Bit specific CRC computation

I also created Instruction sequences for when we already know which bit value we are currently handling.
This takes only 5 Instructions but has the penalty of using one large instruction in the beginning.

// Register Mappings:
//        a0 = Temp
//        a3 = running CRC
//        a4 = CRCPOLY
// Clobbers: a0
do1_crc:
	andi   a0, a3, 1 
	c.srli a3, 1
	c.addi a0, -1
	c.andi a0, a4
	c.xor  a3, a0

This one removes the need of the neg instruction being able to compress 1 more instruction

do0_crc:
	slli a0,a3,31 // Put a3s LSB into a0s MSB
	c.srai a0,31    // Copy MSB into all other bits
	c.srli a3,1
	c.andi a0,a4
	c.xor  a3,a0

rv003usb/firmware/rv003usb.S

Lines 249 to 254 in 4cfe820

    
           // Handle CRC 
        
           andi a0, a3, 1 
        
           neg a0, a0 
        
           c.and a0, a4 
        
           c.srli a3, 1 
        
           c.xor a3, a0

The main trick i used hre is to shift left and then shift right arithmetic to copy the LSB to all into all the other places

The text was updated successfully, but these errors were encountered:

duk-37 · 2023-05-23T04:33:50Z

the sign-preserving right shift trick for the zero bit case should help here, thanks! the generic case is also useful but probably less so given the send logic already specializes for zero- and one- bit cases. also that crc code is misplaced/breaking stuff at the moment anyways

crc1 here is equivalent to what we already have but with
the shift reordered

cnlohr · 2023-05-23T17:56:59Z

@duk-37 any chance you would be interested in reworking some of the assembly once I get a fully working stack? I don't think I want to stake stream time to further optimize things, but it would be fun to do before a supercut.

duk-37 · 2023-05-23T22:15:25Z

@duk-37 any chance you would be interested in reworking some of the assembly once I get a fully working stack? I don't think I want to stake stream time to further optimize things, but it would be fun to do before a supercut.

Sure, I can take a look! Will also be a lot easier once we know more about the chip internals (#5 and Macyler's work)

duk-37 mentioned this issue May 23, 2023

c.sari for crc0 code #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better instruction sequence for CRC calculations #7

Better instruction sequence for CRC calculations #7

Domkeykong commented May 22, 2023

duk-37 commented May 23, 2023

cnlohr commented May 23, 2023

duk-37 commented May 23, 2023

Better instruction sequence for CRC calculations #7

Better instruction sequence for CRC calculations #7

Comments

Domkeykong commented May 22, 2023

General CRC computation

Bit specific CRC computation

duk-37 commented May 23, 2023

cnlohr commented May 23, 2023

duk-37 commented May 23, 2023