This is part of the "microprogramming notes" web pages, which contain the following subtopics:

Simple one-bus CPU architecture
Register in / out connections
Microcode examples <-- you are here
ALU architecture (insides)
ALU interface (outsides)
Control circuitry
Simple two-bus CPU architecture
Simple three-bus CPU architecture (used in RISCs)

Microcode examples

Here are a few microcode examples.

For the simplest control sequence example, let's do a transfer from register R0 to R1, like the PDP-11 instruction "MOV R0, R1". This is a single step:

0. R0_out, R1_in

Another control sequence example:
R2 <- R1+R2

This will illustrate the use of the Y and Z registers. The ALU does its operation on whatever's on the bus and something in a special "other ALU operand" register, which is Y. The ALU output goes into a special result register, namely Z. This is how we use the ALU:

0. R1_out, Y_in
1. R2_out, Add, Z_in
2. Z_out, R2_in

Step 0 transfers (copies) the value in R1 into Y. Step 1 puts the R2 value on the bus, and does an Add, thus adding R1+R2. This result is loaded into Z. Step 2 transfers (copies) the value in Z into R2.

The meaning of these "microinstruction" steps listed above is that all control signals which we do not list for each step are turned off. All control signals we do list are turned on.

Thus the order in which we list these signals is unimportant.

This is crucial, and perhaps subtle.

This notation is like an assembly notation of sorts. The actual meaning is that these control signals are on at the given time and all others not listed are off. The order in which you write them is irrelevant. The sequence is from step to step, not from left to right.

Why have a control for Z_in? It's not a connection to the bus; who cares if Z is just getting set by the ALU all the time even if we don't want the result?

Well, since Z is a register, this additional control line means we don't have to pick up the Z value right away. For example, we could add three numbers by leaving the sum of two of them in Z: Suppose they're P+Q+R:

0. P_out, Y_in
1. Q_out, Add, Z_in
2. R_out, Y_in
3. Z_out, Add, Z_in

This only works if Z is a master-slave-flip-flop register, so that it can be set in the same cycle that its previous value can be used. But we will assume that all of our registers are master-slave unless otherwise specified.

Note that this logic is rife with considerations of clock timing being "long enough" for things to propagate, the ALU to settle down, etc. This puts a cap on CPU clock speed. Each step is one clock cycle, but we may have different parts of each step happen at slightly different times during the clock cycle. It depends on the CPU design.

A full example

Example: the execution of the entire instruction "ADD something, R1".

ADD 1000, R1
means R1 <- M[1000]+R1

0. PC_out, MAR_in, Read, Zero A, Set Carry-In, Add, Z_in
1. Z_out, PC_in, Wait MFC
2. MDR_out, IR_in
3. AddressFieldOfIR_out, MAR_in, Read
4. R1_out, Y_in, Wait MFC
5. MDR_out, Add, Z_in, Set CC
6. Z_out, R1_in, End

Detailed description:

PC_out, MAR_in, Read: These initiate the fetch of the instruction. We put the desired memory address into the MAR and assert the Read control line. We want to read from the memory address which is the contents of the PC register because the PC register (which may or may not be R7; we're not designing control sequences specifically for the PDP-11 here) contains the address of the next instruction to execute; that's what the PC register is for.
Zero A: We'll want a control line which uses 0 rather than Y for A. Thus we can perform ALU operations, so long as they involve 0 as the first operand, without an extra cycle to set up Y. We normally design the control sequences and the hardware in tandem. You can see how to implement a "Zero A" control line -- it is inverted, and ANDed with all the lines coming out of Y. Thus when it is 0, we AND with 1, which does nothing; but when it is 1, we AND with 0, which sets all of them to zero. But "Zero A" does not affect the value stored in Y, it just gives us a zero instead of the value of Y for the ALU's first operand.
Zero A, Set Carry-In, Add, Z_in: We have to increment the PC, so that its contents contain the address of the next instruction for the next time we do this. This whole sequence will be in a big loop.

Note that we are simultaneously fetching the instruction and adding 1 to the PC. We do as much as possible in a single step.

Now, for step 1:

Z_out, PC_in: This is a bus transfer from Z to PC. Z already contains the value PC+1. We copy this into the PC register, thus completing the increment of the PC.
Wait MFC: Recall that a memory operation (the Read in step 0) may take longer than one CPU cycle. We haven't needed the result yet, but we want to use the result (i.e. the contents of the MDR) in step 2. So we have to say that we should pause after step 1 until we have the MFC (memory function complete) signal from the main memory unit. We have a control signal to accomplish this, whose implementation we will examine later in this document.
Step 2. MDR_out, IR_in: Now that we have the instruction, we transfer it to the IR, which is where we put the instruction we're currently executing. The IR has extra circuitry attached which when we do a transfer into the IR, decodes the instruction.

These three steps are common to the execution of any instruction. They're the "instruction fetch" and "increment PC".

Note how we do as many things simultaneously as possible. We did the memory read request in step 0, but while waiting for the result to come back, we stored the PC. Perhaps the result came back before our step 1 completed. If so, no loss. But if not, we managed to sneak in some extra processing for free.

The above may give some additional insight into why, on the PDP-11 for example, the various offsets are relative to the new PC after the fetch. We haven't started specifically executing the ADD instruction yet, we've just done the fetch, and already the PC is incremented.

Now, continuing on specifically with the ADD instruction:

3. AddressFieldOfIR_out, MAR_in, Read: The decoding circuitry extracts the Address field in the instruction, which in this case is the value 1000. We are assuming a far simpler instruction set than the PDP-11 addressing modes; the bits for that 1000 value are right there in the one-word instruction, and it is an absolute address, not a relative one. Coming out of the decoding circuitry are some lines representing only the address field of the IR's contents, and these are tri-stated onto the bus. Thus they have an "out" line.
And this is the address from which we want to read to do the operand fetch, so we transfer it into the MAR and assert the Read control line.
4. R1_out, Y_in, Wait MFC: One of the operands is the contents of R1. For addition, it doesn't matter which operand is which; so again we follow our strategy of trying to get other stuff done while the Read operation might still be in progress. One of the two operands for the ALU Add operation should be on the bus, the other in the Y register. We put the contents of R1 into the Y register rather than the contents of M because the contents of R1 are already handy and it gives us something productive to do while perhaps still waiting for the memory operation to complete. However, we do need the step 3's memory operation's results for step 5, so we also include the Wait MFC control line at this point.
5. MDR_out, Add, Z_in, Set CC: With the MDR contents on the bus, we can do an Add and put the result into Z. We can't put the result directly on the bus, because the contents of MDR are already there. So we don't even have a control line to put the ALU result directly on the bus, because we could never use it -- the ALU's second operand is already there.
The "Set CC" control line causes the condition codes to be affected by the ALU operation. We want this because they should be set according to what happens with this addition; this is the addition which is the one of interest from the point of view of the machine-language programmer. The earlier addition, in the standard fetch sequence, is not something which should change the condition codes. So we didn't list "Set CC" on that line. Unless we list "Set CC", it is off, and it controls the LOAD line for the condition code register, so without Set CC, the condition codes stay as they were despite whatever use we make of the ALU.
6. Z_out, R1_in, End: Transfer the Z contents to R1, thus completing the operation. This step also introduces the "End" control line, which indicates the end of this instruction execution. Thus the End control line means to go to step 0, although perhaps for a different instruction next time. (We'll clarify this below.)

Can we save a step anywhere here? if we could, we would speed up every single ADD instruction in the computer! But no, note that every step has a register "out" control on, and they're all different.

Another example:

The PDP-11 BLT X instruction, where X is an offset. Branch if condition code bit N is 1. Unlike with VELMA, the X value is added to the current contents of the PC register.

0, 1, 2. standard fetch sequence
3. PC_out, Y_in, If then End
4. AddressFieldOfIR_out, Add, Z_in
5. Z_out, PC_in, End

The 'If' there is done in hardware, with control lines. The method is described in the "control circuitry" section later in these web pages.

The Address Field of the IR in this case is the value 'X', not actually an address.

In step 3, we basically do the If then End. However, we also try to get some other work done in the event that we are going to have to do the branch. If N is 0, we're done, it doesn't matter what else we do so long as we don't break anything. So we're putting something into Y, but if N is 0 this is not going to have any lasting effect.

The rationale for transferring the contents of the PC register into Y only becomes apparent in the case where N is 1, and we don't abort the microroutine there. In this case, we need to add that value X to the PC value, since X is an offset. That is, the target branch location is the contents of PC+X. Where X is a certain range of bits of the machine language instruction stored in the IR.

For simplicity, let's assume that the branch instructions' offset is in the same range of bits as the data operations instructions' memory address field. So AddressFieldOfIR_out gives us our X value on the bus, which we can add to the PC value to get the appropriate branch target.

In step 5, we put this calculated value into the PC register, thus accomplishing the branch, and we assert the End control line, which resets the µPC thus restarting the standard fetch sequence which will fetch and execute the instruction from this address which is the new value in the PC register.

This is part of the "microprogramming notes" web pages, which contain the following subtopics:

Simple one-bus CPU architecture

Microcode examples <-- you are here

ALU architecture (insides)

ALU interface (outsides)

Control circuitry

Simple two-bus CPU architecture

Simple three-bus CPU architecture (used in RISCs)

[list of course notes topics available] [main course page]