README generated by Copilot[Claude Haiku 4.5]! (few manual edits though)
SiPi is a Verilog implementation of a 6-stage pipelined MIPS-based processor. This design supports load (lw), store (sw), jump (j), or (or), branch-if-equal (beq), branch-if-not-equal (bne), and and-immediate (andi) instructions with advanced hazard management through stalling and forwarding mechanisms.
Key Features:
- 6-stage instruction pipeline architecture (IF, ID, RR, EX, MEM, WB)
- Support for lw, sw, j, or, beq, bne, and andi instructions
- Data forwarding (bypass) unit for reducing stalls
- Hazard detection unit with load-use hazard detection and control signal flushing
- Comprehensive pipeline register architecture
The SiPi processor implements a 6-stage pipeline, where each stage handles a specific portion of instruction execution:
- Fetches the current instruction from instruction memory using the program counter (PC)
- Increments the PC for the next sequential instruction based on
PC_Writesignal - Stores fetched instruction and incremented PC in the IF/ID pipeline register
- Handles branch and jump target updates when control hazards are resolved
- Module:
Instruction_fetch(in if.v) - PC increment is controlled byPC_Writesignal from hazard detection unit
- Decodes the fetched instruction to identify operation type and operand fields
- Generates control signals that guide the instruction through remaining pipeline stages
- Extracts register addresses (rs, rt, rd) from the instruction
- Performs sign extension on immediate values
- Modules:
Instruction_decode+control(in id.v) - Key Control Signals: RegDst, Jump, MemRead, MemWrite, MemToReg, ALUSrc, RegWrite, BranchEQ, BranchNEQ, ALUOp
- Hazard Detection Integration: The
disable_controlsignal (from Hazard Detection Unit) can zero out all control signals to implement stalling for load-use hazards
- Reads two source operands from the register file (32 x 32-bit registers) based on instruction fields
- Module:
Read_register(in rr.v) - Implementation Detail: Register reads are combinational (not sequential), allowing immediate propagation of values
- Register File: 32 x 32-bit registers, with separate read ports (combinational) and write port (on clock edge)
- Performs early hazard detection to identify data dependencies
- Stores register values, control signals, and immediate values in the RR/EX pipeline register
- Performs arithmetic and logic operations using the ALU
- Calculates memory addresses for load/store instructions using the Address Calculator
- Evaluates branch conditions for conditional branches (beq, bne) using the zero flag
- Computes jump targets
- Passes ALU results to the EX/MEM pipeline register
- Modules:
Execution,alu_control,Address_calculator(in ex.v) - Forwarding Integration: Interacts with the forwarding unit to obtain correct operand values when data hazards exist - uses
forwardA_resultandforwardB_resultfrom forwarding multiplexers as ALU operands
- Performs memory operations for load and store instructions
- Prepares memory address and write data signals
- Reads or writes data to DMEM (each location is 8-bit; 32-bit values use 4 locations in big-endian format)
- Stores results in the MEM/WB pipeline register
- Module:
Memory(in mem.v) - For non-memory instructions, data bypasses this stage without modification
- Writes the final result back to the register file
- For ALU operations, writes computed result
- For load instructions, writes data fetched from memory
- Module:
Write_back(in wb.v) - Updates destination register with computed value
- Completes instruction execution
Detection and Resolution:
-
Hazard Detection Unit: Identifies potential data dependencies between pipeline stages (in hazard_detection.v)
- Location: Operates at the ID/RR boundary to detect conflicts early
- Detection Logic: Identifies load-use hazards only
- Condition:
ID_EX_MemRead == 1AND(ID_EX_Rt == IF_ID_Rs OR ID_EX_Rt == IF_ID_Rt) - This detects when a load instruction in the RR/EX stage will have its result needed by the instruction currently in the ID stage (next instruction to enter RR)
- Condition:
- Signals Generated:
PC_Write = 0: Prevents PC from incrementing (stalls IF stage)IF_ID_Write = 0: Prevents IF/ID pipeline register from updating (stalls ID stage)disable_control = 1: Zeros all control signals (through control unit) so that RR/EX stage operations are disabled
- Effect: Inserts one bubble (stall cycle) to allow load instruction to complete MEM stage before the dependent instruction consumes the value
-
Forwarding Unit (Bypass Logic): Reduces stalls by forwarding intermediate results (in hazard_detection.v)
- Monitors all pipeline stages for uncommitted results in EX and MEM stages
- Operand A Forwarding:
- If
EX_MEM_RegWrite == 1andEX_MEM_Rd == ID_EX_Rs: forward from EX/MEM (forwardA = 2'b10) - Else if
MEM_WB_RegWrite == 1andMEM_WB_Rd == ID_EX_Rs: forward from MEM/WB (forwardA = 2'b01) - Else: use original register value (forwardA = 2'b00)
- If
- Operand B Forwarding:
- If
EX_MEM_RegWrite == 1andEX_MEM_Rd == ID_EX_Rt: forward from EX/MEM (forwardB = 2'b10) - Else if
MEM_WB_RegWrite == 1andMEM_WB_Rd == ID_EX_Rt: forward from MEM/WB (forwardB = 2'b01) - Else: use original register value (forwardB = 2'b00)
- If
- Multiplexer Logic (
forwardA_muxandforwardB_muxin hazard_detection.v):- Select 2'b00: Use ID/EX operand (original register file value)
- Select 2'b01: Use MEM/WB result (write_data_WB from Write_back stage)
- Select 2'b10: Use EX/MEM result (alu_result_MEM)
- Forwarding Connections (in pipelined_HZD_FWD.v):
- ALU's operand A receives
forwardA_resultinstead of RREX_rd1 - ALU's operand B receives
forwardB_resultinstead of RREX_rd2
- ALU's operand A receives
- Limitation: Cannot bypass load-use hazard (result from load only available in WB stage, needed in EX stage), so stalling is still required for such cases
- Branch/Jump Resolution: Branch target addresses and jump targets are computed in the EX stage
- Branch condition evaluation:
PCSrc_EX = (RREX_BranchEQ & zero_flag) | (RREX_BranchNEQ & (~zero_flag)) - Address calculator computes branch target:
resolved_address = sign_extended_address << 2 + current_PC - Jump address formed from instruction field:
PC_correct = {PC_msb_4, jump_address_resolved << 2}
- Branch condition evaluation:
- Pipeline Flushing: When a branch is taken or a jump is executed, incorrect instructions in earlier pipeline stages are naturally discarded as the correct address propagates
- Supported Instructions: beq (branch if equal), bne (branch if not equal), and j (jump)
- Impact: The pipeline loses cycles of throughput per taken branch/jump as incorrect instructions in IF, ID, and RR stages are discarded
- Pipeline Register Architecture: Each inter-stage pipeline register (IF/ID, ID/RR, RR/EX, EX/MEM, MEM/WB) captures all necessary data and control signals
- No Resource Conflicts: The 6-stage design avoids structural hazards in standard scenarios
- Register File: Designed with separate read ports (combinational) and write port (synchronous) to avoid conflicts between stages
- Memory: Separate instruction memory and data memory prevent resource conflicts
Instruction Fetch (IF)
- PC management and increment logic (controlled by PC_Write signal)
- Instruction memory interface (32 locations, 32-bit instructions)
- Branch/Jump target handling through PC multiplexers
Instruction Decode (ID)
- Control signal generation based on opcode
- Immediate value extraction and sign extension
- Instruction field decoding (opcode, rs, rt, rd)
- Control signal disabling via disable_control input
Register Read (RR)
- Register file read operations (combinational)
- 32 x 32-bit register file with synchronous write
- Stores register values and control signals in RR/EX pipeline register
Execute (EX)
- ALU for arithmetic and logical operations (AND, OR, ADD, SUB)
- Address calculation for memory operations and branches
- Branch/Jump condition evaluation and target computation
- Forwarding unit integration (receives forwardA and forwardB results)
- PC multiplexers for branch and jump target selection
Memory (MEM)
- Memory address setup and validation
- Data memory read/write operations
- Data latching and alignment (8-bit DMEM with big-endian 32-bit storage)
- 32 x 8-bit data memory
Write Back (WB)
- Register file write logic
- Result multiplexing (ALU result vs. memory data)
Hazard Detection Unit
- Dependency checking across pipeline stages (ID/EX and MEM/WB boundaries)
- Stall signal generation (PC_Write, IF_ID_Write, disable_control)
- Load-use hazard identification
Forwarding Unit
- Bypass path multiplexing for two ALU operands
- Result forwarding from EX and MEM stages
- Operand selection logic (3-to-1 mux for each operand)
Pipeline registers store intermediate data between stages:
-
IF/ID: Instruction (32-bit), incremented PC (32-bit)
- Updated every cycle when IF_ID_Write == 1; frozen when IF_ID_Write == 0
-
ID/RR: Opcode (6-bit), rs/rt/rd addresses (5-bit each), sign-extended immediate (32-bit), all 10 control signals
- Always updated on clock edge; disabled via disable_control signal in control unit
-
RR/EX: Read data 1 & 2 (32-bit each), rs/rt/rd addresses (5-bit each), sign-extended immediate (32-bit), all control signals
- Carries register file outputs and control bits to EX stage
-
EX/MEM: ALU result (32-bit), write data (32-bit from forwarding mux), write register (5-bit), zero flag (1-bit), MemRead, MemWrite, MemToReg, RegWrite, BranchEQ, BranchNEQ
- Passes ALU results and memory control signals to MEM stage
-
MEM/WB: Final result from memory (32-bit), ALU result (32-bit), write register (5-bit), MemToReg, RegWrite
- Carries both possible write-back sources (memory data and ALU result)
Clock 1: OR R3, R1, R2 → IF stage (fetch instruction)
Clock 2: OR R3, R1, R2 → ID stage (decode instruction)
Clock 3: OR R3, R1, R2 → RR stage (read R1 and R2)
Clock 4: OR R3, R1, R2 → EX stage (compute R1 | R2)
Clock 5: OR R3, R1, R2 → MEM stage (no memory operation)
Clock 6: OR R3, R1, R2 → WB stage (write result to R3)
Clock 1: LW R1, 100(R2) → IF
Clock 2: LW R1, 100(R2) → ID | OR R3, R1, R5 → IF
Clock 3: LW R1, 100(R2) → RR | OR R3, R1, R5 → ID
Clock 4: LW R1, 100(R2) → EX | OR R3, R1, R5 → RR (Stall triggered by hazard detection)
(IDRR_MemRead=1, IDRR_rt=R1, IFID_Rs=R1 → Hazard detected)
(Signals: PC_Write=0, IF_ID_Write=0, disable_control=1)
Clock 5: Stall → Stall | OR R3, R1, R5 → Stall (waits for R1)
(LW in MEM stage, R1 result not yet available)
(PC frozen, IF/ID frozen, RR/EX control signals disabled)
Clock 6: LW R1, 100(R2) → MEM | OR R3, R1, R5 → RR (can proceed)
(Hazard cleared: LW has left RR/EX boundary, OR can now enter EX)
(PC_Write and IF_ID_Write return to 1, disable_control returns to 0)
Clock 7: LW R1, 100(R2) → WB | OR R3, R1, R5 → EX (R1 available via forwarding)
(LW writes to R1, OR reads R1 via combinational register read or forwarding)
(forwardB selected by forwarding unit from MEM/WB stage)
Clock 8: N/A | OR R3, R1, R5 → MEM
Clock 9: N/A | OR R3, R1, R5 → WB
The stall is necessary because the result of LW is not available for forwarding until the WB stage, but OR needs it in the EX stage. One stall cycle allows LW to advance to MEM, making the result available in MEM/WB registers where forwarding can access it.
The repository includes testbenches in the Testbench directory for validating:
- Instruction execution (lw, sw, j, or, beq, bne, andi)
- Data forwarding correctness
- Hazard detection and stalling
- Pipeline flushing on branch/jump
- Pipeline register state transitions
- Memory operations with 8-bit DMEM
SiPi/
├── Modules/ # Core processor modules (Verilog)
├── Testbench/ # Test benches for validation
└── README.md # This file