Skip to main content

CPU Pipeline Stages

Modern CPUs don't execute one instruction completely before starting the next. Instead, they overlap execution using a pipeline — the same way an assembly line lets a car factory build multiple cars simultaneously, each at a different station.

The Classic 5-Stage RISC Pipeline

Most computer architecture courses teach the five-stage pipeline popularized by MIPS and DLX:

StageAbbreviationWhat Happens
Instruction FetchIFRead the next instruction from memory using the Program Counter (PC)
Instruction DecodeIDDecode the opcode; read source register values
ExecuteEXALU performs the operation or calculates a memory address
Memory AccessMEMReads (load) or writes (store) data memory
Write BackWBWrites the result back to the destination register

Interactive Simulator

Step through clock cycles to watch five instructions flow through all five stages simultaneously. Hover over any colored cell for a description of what's happening in that stage.

CPU Pipeline Simulator
Step through clock cycles to watch instructions flow through IF → ID → EX → MEM → WB
Space-Time Diagram
CC1
CC2
CC3
CC4
CC5
CC6
CC7
CC8
CC9
ADD R1, R2, R3
IF
ID
EX
MEM
WB
LW R4, 0(R1)
IF
ID
EX
MEM
WB
SUB R5, R4, R2
IF
ID
EX
MEM
WB
SW R5, 4(R1)
IF
ID
EX
MEM
WB
BEQ R1, R0, +8
IF
ID
EX
MEM
WB
Clock Cycle 1 — Active Stages
IF: ADD R1, R2, R3
Stage Legend (hover for details)
IF
Instruction Fetch
ID
Instruction Decode
EX
Execute (ALU)
MEM
Memory Access
WB
Write Back
Cycle 1 of 9

Why Pipeline?

Without pipelining, each instruction takes 5 cycles. With a full pipeline, one instruction completes every cycle once the pipeline is filled:

No pipeline:   I1: 5 cycles, I2 starts at cycle 6, I3 at cycle 11 ...
With pipeline: I1 completes at cycle 5, I2 at cycle 6, I3 at cycle 7 ...

Throughput approaches 1 instruction/cycle (IPC = 1). Real out-of-order processors exceed this with superscalar execution (multiple pipelines).

Pipeline Hazards

Pipelining introduces hazards — situations where the next instruction can't start in the immediately following cycle.

Structural Hazard

Two instructions need the same hardware resource at the same time.

Example: A CPU with a single memory port can't fetch a new instruction (IF stage) while another instruction is reading from memory (MEM stage) in the same cycle.

Fix: Separate instruction memory (I-cache) from data memory (D-cache). Stall one instruction.


Data Hazard (RAW — Read After Write)

An instruction reads a register that a previous instruction hasn't written yet.

ADD R1, R2, R3    # writes R1 (available after WB at cycle 5)
SUB R4, R1, R5 # reads R1 (needs it in ID at cycle 3) ← RAW hazard!

Fix 1 — Stalling: Insert NOP bubbles until the value is ready. Wastes cycles.

Fix 2 — Forwarding/Bypassing: Route the EX output directly back to the EX input of the next instruction — no stall needed for most cases.


Control Hazard (Branch Hazard)

A branch instruction changes the PC, but the pipeline has already fetched the next 1–4 instructions from the wrong path.

BEQ R1, R0, target    # branch result known at end of EX
<fetched speculatively — may need to flush>

Fix: Branch prediction (see the Branch Prediction page).


Key Insight

The pipeline is a throughput optimization, not a latency one. Each instruction still takes 5 cycles; you just get one completing per cycle at steady state. Hazards break this ideal and are the reason CPU microarchitecture is complex.

Further Reading