Skip to content

Conversation

@jgw0915
Copy link

@jgw0915 jgw0915 commented Dec 18, 2025

Fix false load-use hazards by gating rs1/rs2 usage in ID stage

Background

While analyzing control hazard waveforms, I discovered that the current load-use hazard detection logic can spuriously trigger stalls for certain instruction patterns.

Specifically, the control logic in Control.scala compares rd_ex against both rs1_id and rs2_id unconditionally. However, for several instruction types, the bit positions corresponding to rs1 or rs2 do not represent architectural source registers.

Root Cause

Snipaste_2025-12-15_23-58-59
  • For I-type load instructions (e.g., lw a0, 16(a5)), only rs1 is architecturally used.

  • rs2_id is still wired directly from instruction bits [24:20], which encode imm[4:0] for I-type instructions.

  • When the immediate value coincidentally equals a register index (e.g., imm = 16x16), the condition

    rd_ex == rs2_id
    

    may incorrectly evaluate to true.

  • This causes io_pc_stall, io_if_stall, and io_id_flush to assert, inserting an unnecessary bubble even though no real data dependency exists.

This behavior is technically consistent with the existing implementation but reflects a decode-level limitation: the hazard unit does not know whether the ID-stage instruction actually uses rs1 or rs2.

Fixes in this PR

Explicit rs1 / rs2 usage signals

  • Added decode-time signals to indicate whether an instruction uses rs1 and/or rs2
    • Identified that jal, auipc, and lui do not use rs1 (the same bit positions encode immediates)
    • rs2 is only considered for instruction classes that architecturally use a second source register, including:
      • R-type ALU instructions (e.g., add, sub, and, or)
      • S-type store instructions (e.g., sw)
      • B-type branch instructions (e.g., beq, bne)
  • These signals follow the added design hint mentioned in CA25 Exercise 19
  • Enables the control logic to reason about architectural source operands correctly

Result

Snipaste_2025-12-16_01-02-28
  • Eliminates false-positive load-use stalls
  • Preserves correct handling of genuine hazards
  • Improves correctness and precision of control hazard detection
  • Waveform behavior after the fix matches architectural expectations and avoids unnecessary pipeline stalls

You can check the whole implementation logic in the Control.scala on my Github repo

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request adds uses_rs1_id and uses_rs2_id signals and wires them to Control, but Control.scala still has ? placeholders. The hazard detection logic is not modified to use these signals.
Looking at the author's full implementation in their fork, the intended fix is:

  val hazard_ex_rs1 = io.uses_rs1_id && (io.rd_ex === io.rs1_id)
  val hazard_ex_rs2 = io.uses_rs2_id && (io.rd_ex === io.rs2_id)

  when(
    ((io.memory_read_enable_ex || io.jump_instruction_id) &&
     (io.rd_ex =/= 0.U) &&
     (hazard_ex_rs1 || hazard_ex_rs2))
    ||
    (io.jump_instruction_id &&
     io.memory_read_enable_mem &&
     (io.rd_mem =/= 0.U) &&
     ((io.uses_rs1_id && (io.rd_mem === io.rs1_id)) ||
      (io.uses_rs2_id && (io.rd_mem === io.rs2_id))))
  ) { ... }

The PR needs to include this Control.scala change or the fix is non-functional.

@jserv jserv changed the title Fix/false load-use stall by gating rs1 amd rs2 harzard check Fix/false load-use stall by gating rs1 amd rs2 hazard check Dec 19, 2025
@jserv
Copy link
Contributor

jserv commented Dec 19, 2025

Exercise 19 Intent: This is part of CA25 Exercise 19. The PR adds the usage signals which is good guidance, but should:

  1. Keep the exercise structure intact (placeholders remain for students)
  2. Add the signals as "hints" without solving the exercise completely

Since this PR is meant to be a complete fix, the Control.scala hazard logic needs updating.

@jserv
Copy link
Contributor

jserv commented Dec 19, 2025

Consider to contribute test case demonstrating false stall is eliminated:

lw  x16, 16(x0)   # imm[4:0] = 16 = x16
add x1, x2, x3    # rs2 = x3, should NOT stall despite x16 match

@jgw0915
Copy link
Author

jgw0915 commented Dec 24, 2025

So, in the requested change, I should put my full implementation of Control.scala in this PR (replace placeholders), modify camelCase to snake_case, append missing edge cases, and contribute false stall test cases, right?

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from b7ace8c to 0b2c4ff Compare December 24, 2025 09:36
@jserv
Copy link
Contributor

jserv commented Dec 25, 2025

So, in the requested change, I should put my full implementation of Control.scala in this PR (replace placeholders), modify camelCase to snake_case, append missing edge cases, and contribute false stall test cases, right?

Yes, that is the purpose of review.

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from d148958 to bbd344d Compare December 25, 2025 09:30
jserv

This comment was marked as outdated.

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch 3 times, most recently from a6d445e to 81839a6 Compare December 25, 2025 10:19
@jgw0915
Copy link
Author

jgw0915 commented Dec 25, 2025

I have updated a test case for false load use stall in Section 11 of hazard_extended.S. The reason why Section 11 checks mem[0x38] == 3 is that this test is designed to detect only the presence of an extra (spurious) stall, not to count absolute clock cycles.

In hazard_extended.S, Section 11 measures a local cycle window using two csrr cycle instructions surrounding two back-to-back lws:

csrr a2, cycle
lw   a6, 0(a5)
lw   a7, 16(a5)
csrr a3, cycle
a3 = a3 - a2

Assume the first csrr a2, cycle reads cycle = 1000.

Between the two CSR reads, there are two intervening instructions (lw a6, 0(a5) and lw a7, 16(a5)), which execute in the next two cycles:

  • after first lw: cycle = 1001
  • after second lw: cycle = 1002

The second csrr a3, cycle itself executes in the following cycle and therefore reads cycle = 1003.

Thus, the measured delta is:

a3 - a2 = 1003 - 1000 = 3

This value (3) represents the expected baseline when no extra bubble is inserted.

The purpose of this test is to detect whether the hazard unit inserts an additional stall cycle due to a false-positive load-use hazard. If rs2 is incorrectly considered for I-type instructions (i.e., imm[4:0] is treated as rs2), the control logic asserts pc_stall / if_stall / id_flush, inserting one extra bubble. In that case, the second csrr is delayed by one more cycle and reads 1004, yielding:

a3 - a2 = 4

Therefore:

  • mem[0x38] == 3 ⇒ no false stall (correct behavior)
  • mem[0x38] == 4 ⇒ one spurious stall inserted (buggy behavior)

Section 11 is thus a regression test that isolates decode-level false hazard detection, rather than a test of absolute cycle timing.

  • mem[0x38] == 3 ⇒ no false stall (correct behavior)
  • mem[0x38] == 4 ⇒ one spurious stall inserted (buggy behavior)

This makes Section 11 a regression test that isolates decode-level false hazard detection, rather than a test of absolute cycle accounting.

Here is the actual waveform observed with no gating of uses_rs1 and uses_rs2, you can see the false stall happen at 301~305 ps :

((io.memory_read_enable_ex || io.jump_instruction_id) && // Either:
      // - Jump in ID needs register value, OR
      // - Load in EX (load-use hazard)
      (io.rd_ex =/= 0.U) &&                                 // Destination is not x0
      ((io.rd_ex === io.rs1_id) || (io.rd_ex === io.rs2_id))) // Destination matches ID source
image

After implementing the gating with uses_rs1 and uses_rs2, the false stall is successfully eliminated in the test case.

    ((io.memory_read_enable_ex || io.jump_instruction_id) && // Either:
      // - Jump in ID needs register value, OR
      // - Load in EX (load-use hazard)
      (io.rd_ex =/= 0.U) &&                                 // Destination is not x0
      ((io.uses_rs1_id && (io.rd_ex === io.rs1_id) || io.uses_rs2_id && (io.rd_ex === io.rs2_id)))) // Destination matches ID source
image

@jgw0915 jgw0915 requested a review from jserv December 25, 2025 12:52
jserv

This comment was marked as outdated.

Normalize camelCase naming, add missing uses_rs1 edge cases, and fix
ID-stage rs1/rs2 read-address assignment to match expected read logic.
Introduce a regression test to ensure the pipeline does not insert
a stall when a load-use dependency is incorrectly detected in the
EX stage.
Update forwarding/hazard detection in the five-stage pipeline Control module
to correctly detect EX/MEM dependencies and handle stalls for load-use
and jump instructions. This commit refactors condition expressions to:

- Include memory-read enable and jump flags when checking for hazards.
- Consolidate rs1/rs2 use checks against rd in EX and MEM stages.
- Ensure that load-use hazards in MEM correctly contribute to stall
  conditions.
- Simplify and make hazard conditions more readable and consistent.
@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from 81839a6 to e49f798 Compare December 25, 2025 15:33
@jgw0915 jgw0915 requested a review from jserv December 25, 2025 15:34
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run 'git rebase -i' to squash commits and enforce the rules described in https://cbea.ms/git-commit/ .

Read the above carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants