Skip to content

[bitmanip] Remove redundant logic #2296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ESE-2019
Copy link

Remove the redundant logic & ~bitcnt_mask_op, as explained in issue #2294 .

    always_comb begin
      unique case (1'b1)
        zbe_op:      bitcnt_bits = operand_b_i;
        // bitcnt_cz:   bitcnt_bits = bitcnt_bit_mask & ~bitcnt_mask_op; // clz / ctz
        bitcnt_cz:   bitcnt_bits = bitcnt_bit_mask; // clz / ctz
        default:     bitcnt_bits = operand_a_i; // cpop
      endcase
    end

@rswarbrick
Copy link
Contributor

I think this is a sensible change, thanks. Would you mind expanding the commit message to explain what's going on? It's always easier to understand the history of the repository if it's not dependant on GitHub. With my "maths hat on", I'm always enthusiastic about giving a minimal "proof" of why a commit is the right thing to do.

Working out how to write such a commit message sometimes makes me adjust the code as well, which I think is probably a good thing. I'd also be tempted to extend this commit slightly to get rid of the slightly silly loops for replicating the bit. Is that easy? (And does it work? I only suggested this from reading the existing code...)

@ESE-2019
Copy link
Author

I think that comparing the PPA of different schemes is an important method for selecting among different options. Therefore, I used Design Compiler to synthesize three implementations (including only clz, ctz, and cpop), using the NanGate45 standard cell library and a 1GHz frequency constraint. All three designs achieved the same timing slack.

The original version had an area of 287.546; Scheme 1 (with & ~bitcnt_mask_op removed) had an area of 292.866; and Scheme 2 (new commit in the PR, i.e., (bitcnt_mask_op - 32'h1) & ~bitcnt_mask_op) had an area of 268.926.

Interestingly, despite removing redundant logic, Scheme 1 resulted in a larger area. In contrast, Scheme 2 achieved about a 7% area reduction.

I added comments in the code to explain why the method makes sense.

    // Bitcnt_bits generation for clz and ctz:
    // The operand of ctz can be divided into 3 parts: the don't-care bits in the high-order bits,
    // the lowest-order set bit, and the trailing zeros. Bitcnt_bit_mask is obtained by subtracting 1
    // from bitcnt_mask_op, which clears the lowest-order set bit and sets all the trailing zeros to 1.
    // Then, bitcnt_bit_mask & ~bitcnt_mask_op clears the don't-care bits.
    // At this point, counting the number of 1s gives the number of trailing zeros.
    // In order to create the bit mask for leading zeros, the input operand needs to be reversed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants