This CPU has 120 GPRs, only 15 of which are visible at a time. The registers visible at one time are numbered R1 through R15. A JSR advances the register window by 7 and an RTS puts it back by 7.
R0 is zero; it's not a register. E.g. "MOV R3, R4" can be written "ADD R0, R3, R4". (And there is no MOV instruction.)
This means that the standard register in which a subroutine leaves a return value will be R1 -- the subroutine's R1, which will be the caller's R8. Parameters are passed in the caller's R8 through R15; parameters after the eighth are passed on the stack (very rare). These registers are then free for use by the called subroutine; it does not have to restore any of these, even those which were not used for parameters.
R0 can also be specified as a target register, when we want to throw the value away; thus VELMA's "CMP R2, R1" is written "SUB R1, R2, R0". Also note the subtraction operand order: It's always first-thing minus second-thing.
There are three instruction formats (actually, categories #2 and #3 have the same bit layout):
The IND instruction is used for indirect addressing, possibly with an offset (indexing). Its semantics are: reg2 <- M[reg0+reg1]. After an IND, reg2 may not be used or modified in the next instruction.
There is a delayed branch rule and a delayed load rule.
Here's a simple example: a set of instructions to compute C := A + B. There's really nothing very interesting to do with the delayed load slots, so we don't, except for the last.
LOAD A, R1 NOP ; can't do a load in the delayed load slot, either... LOAD B, R2 NOP ADD R1, R2, R2 NOP ; and you can't do STORE R2 immediately after ADD ,,R2 STORE R2, C HALT ; ha ha! Store will complete while the CPU is halted!
Some more-interesting code will be discussed in the final tutorial.