RISC (Reduced Instruction Set Computer)
Here's the basic point-form text, at least most of what I wrote on the
chalkboard.
Goals
-
speed
-
Given modern sophisticated compilers, and the fact that people basically
do not write directly in assembly language anymore, we make the machine
language geared towards the hardware, not the programmer. In fact we may
even be willing to end up with an assembly language which would be really
painful for people to try to write in, because that doesn't matter, because
we'll use high-level languages and compilers.
-
The goal of the following design choices is SPEED.
-
reduce number of cycles per instruction.
Goal: execute one instruction per cycle
Basic design ideas
- no microcode
- pipelining
- Any operation that cannot be completed in one cycle is not included in
the instruction set.
- three-register instruction format
- lots of registers
- overlapping register windows
Pipelining
Instructions X, Y, and Z which do not affect control flow:
addr | instr |
2 | X |
3 | Y |
4 | JUMP 8 |
5 | |
6 | |
7 | |
8 | Z |
Phases of instruction execution and some nicknames for them:
- fetch & decode instruction (FI)
- fetch operands (FO)
- store result (S)
Traditionally, execute each phase in turn for each instruction.
- FIX
- FOX
- SX
- FIY
- FOY
- SY
- FIJUMP
- FOJUMP
- SJUMP
- FIZ
- FOZ
- SZ
But consider that once we've finished fetching the X instruction, during the
decoding we're not using the memory path (MAR/MDR). We may or may not need
the memory path to fetch the operands; let's assume we didn't. Then there's
no reason we couldn't do "FOX" and "FIY" simultaneously.
- FIX
- FOX and FIY (simultaneously!)
- SX and FOY and FIJUMP
- SY and FOJUMP and FI5 (well, we don't know we're jumping yet)
- SJUMP (oops that changes the PC, start over)
- FIZ
- FOZ and FI9
- SZ and FO9 and FI10
- ...
Lots of issues.
Pipelining issues:
- branch invalidates work in progress
- time lost (threatens one instruction per cycle!)
- have to reverse any operand-fetch side-effects (e.g. auto-increment)! (is this even always possible?)
- multiple phases may need MAR/MDR
- "load" (move memory to register) may require extra cycle
- By the time we've fetch any other kind of operand, for a load we've
just got the effective address. We now need to put this into the MAR
and do a Read etc.
- problem if an operand is the result of the previous instruction
Some typical features of RISC CPUs (not all ubiquitous)
- no complex addressing
- load/store instruction set
- separate instruction/data paths to memory
- delayed load
- operand forwarding
- delayed branch
Delayed load
If a "LOAD" requires an extra cycle, we say that we simply may not
access the load target in the next instruction!
We want:
ADD R1, R3
LOAD 500, R4
ADD R4, R3
we can't access R4 on that third line (it's an operand as well as the target)
could insert a NOP
better yet, rewrite:
LOAD 500, R4
ADD R1, R3
ADD R4, R3
Delayed branch
like delayed load but more bizarre
time | fetch | execute |
0 | X | |
1 | Y | X |
2 | JUMP | Y |
3 | [5] | JUMP |
4 | Z | |
5 | [9] | Z |
At time 4, no instruction is executed. At time 3, we do a useless fetch, and
then we squash it.
There are pipelined architectures in which we do fetch instruction Z at time
3...
But there's a better way, which is a more typical RISC approach. It's quite
bizarre, but it involves no extra circuitry, in fact even less than squashing.
But, it's quite bizarre from the assembly-language programming point of view
and is another good reason to write in high-level languages only.
This is the delayed branch rule. What the delayed branch rule means is quite
simply that the instruction following a branch, which will have been fetched
and decoded, is EXECUTED, even if the branch is taken! So in this example,
instruction 5 WILL be executed.
As with the delayed load, we can just put a NOP there, but more usually we try
to be clever.
If that jump op is not conditional on the results of instruction Y, we can
write:
addr | instr |
2 | X |
3 | JUMP 8 |
4 | Y |
5 | |
6 | |
7 | |
8 | Z |
and the instructions executed will be X, Y, Z.
This special memory location following a branch is known as the "delayed branch
slot", a place where you put one additional instruction to be executed.
If the jump IS conditional on the result of instruction Y, perhaps we can do
Y, then the conditional jump, then X.
Failing everything, we use a NOP.
But assuming we usually manage to fill the delayed branch slot with a useful
instruction, we're managing to avoid squashing the pipeline on every jump,
which saves a lot of time.
Note that we probably still have to squash upon interrupt.
Three-register instruction format
Typical | Typical RISC |
ADD R1, R2 | ADD R1, R2, R2 |
MOV R1, R3 ADD R2, R3 | ADD R1, R2, R3 |
Lots of registers
Extra space on chip: no microprogram, and most circuitry is simpler.
Registers really speed up stuff.
Digression on the use of registers by compilers required:
Programs in
high-level languages do not refer to registers; compilers can decide to put
variables in registers.
Of course we also need registers to be accumulators, that is, to do arithmetic in;
and to store intermediate results. We also use them to pass parameters.
The following:
i = x * y;
j = x + 2;
can be compiled to something like:
LOAD X, R1
LOAD Y, R2
MULT R1, R2
STORE R2, I
ADDI 2, R1
STORE R1, J
The naive compilation of the high-level language code results in two loads of x. But we can
keep x stashed in R1 and avoid re-loading it. We can also identify common
sub-expressions (i.e. larger than a single variable)
and use registers to keep their value for subsequent reuse.
All these uses of registers make a program run faster. The more registers you
have, the more speedups a compiler (or human programmer)
can produce by clever use of registers.
Register windows
Huge number of registers. Only some visible at a time. See example language
below. JSR slides window one way; RTS slides it back.
Summary
The points we discussed are:
- increase speed (design strategy #1 is the idea that speed is the most
important thing, not ease of programming)
- simple instructions
- fixed-format instructions
- no microcode
- pipelining
- no complex addressing
- load/store architecture
- three-register instruction format
- register windows
- other (a great many consequences of the above, e.g. pipelining leads to
delayed branch)
Not all of these ideas are necessarily found in any one RISC CPU, but a CPU
called a RISC will probably have most if not all of the above ideas.
In the other direction, just about any modern CPU does some pipelining,
even if it can't do some of these other things because it has a
backward-compatible instruction set.
Some of these ideas are arguably just as applicable to non-RISC CPUs, e.g.
register windows.
But register windowing is more likely to be found in a RISC because it's more
likely to have more registers, because more chip space is available due to the
lack of microcode and other complexities.
Example language
Here is a
description of a hypothetical RISC machine language
with some example code.
I designed it to be severe along these lines. In practice, modern RISC
machine languages are normally not quite so severe, but exhibit some of
these strange characteristics.
References to "assignment three" in that document are actually a previous
year's
assignment three, but the examples are just as good even if they weren't
assignment questions, I think. I can translate some of this year's assignment
three solutions to this RISC machine language if people ask.
[list of course notes topics available]
[main course page]