In this section, we’ll give a brief introduction to one of the applications of graphs to modelling computer programs. Back in Chapter 16, you learned about abstract syntax trees, which we can use to represent program code. Now, we’re going to look at how to represent not just program code, but also the different possible execution paths through a program. This type of modelling is an extremely powerful tool, and is used by both code checking tools like PythonTA as well as compilers for various programming languages to generate efficient machine instructions from program code.
Intuitively, a control flow graph is a representation of the different blocks of code in a Python program, and the different paths that the Python interpreter can take through the code. To get a clearer sense of what this means, let’s introduce one foundational definition.
A basic block is a sequence of non-compound statements and expressions in a program’s code that are guaranteed to execute together, one after the other.
Here are some examples and non-examples of basic blocks.
# A single statement is a basic block.
= 1
x
# A sequence of multiple statements and function calls is a basic block.
= 5
x = x + 2
y = f(x, y)
z print(x + y + z)
# A basic block can end with a return or raise statement.
= 5
x = x + 2
y return f(x, y)
# But a sequence of statements with a return/raise in the middle is
# NOT a basic block, since the statements after the return/raise aren't
# going to execute.
= 5
x return x
= x + 2 # Will never execute!
y
# An if statement is not a basic block, since it is a compound statement.
# The statements it contains aren't guaranteed to execute one after the other.
if x > 5:
= 3
y else:
= 4 y
Typically we treat basic blocks as being maximal, i.e., as
large as possible. So if we have a sequence of assignment statements
(x = 5
, y = x + 2
, etc.), we treat them as one
big block rather than consisting of multiple single-statement
blocks.
Now let’s look at that if statement example in more detail. We can
divide it up into three basic blocks: one for the condition
(x > 5
), then one for the if branch (y = 3
)
and one for the else branch (y = 4
). When we first learned
about if statements in Section 3.4, we drew simple
diagrams to show the possible ways the Python interpreter could take
through our code. We can now formalize this idea, and extend it to other
kinds of control flow statements like loop.
A control-flow graph (CFG) of a program is a graph \(G = (V, E)\) where:
\(V\) is the set of all (maximal) basic blocks in the program code, plus one special elements representing the end of a program. We typically restrict \(V\) to only include reachable basic blocks, i.e., code that can actually be executed at some point by the Python interpreter.
\(E\) is the set of edges, where:
return
or
raise
statement.Unlike the graphs we’ve seen so far, a control flow graph’s edges are directed, meaning order matters! Formally, we can represent each edge as a tuple rather than a set, where \((b_1, b_2)\) means there’s an edge from \(b_1\) to \(b_2\). This is different from the tuple \((b_2, b_1)\)—it is possible (and common) for an edge \((b_1, b_2)\) to exist, but not \((b_2, b_1)\).
Because our edges are now directed, we will draw them using arrows rather than simple lines. Here are two examples of control flow graph diagrams.
Program code | Control flow graph |
---|---|
|
|
|
One subtlety with our second example is that the if condition
x > 5
doesn’t appear in its own basic block! Instead,
it’s merged with the previous block, since it is always executed
immediately after the print(x)
call. This illustrates what
we mean by treating basic blocks as maximal: while we could have used a
separate block for the x > 5
, it would have been
redundant to do so.
Next, let’s look at how a while loop is represented using a control flow graph. Here’s a simple example to start with:
Program code | Control flow graph |
---|---|
|
This diagram is a bit more compleicated than the one before. The
while loop condition (x < 10
) is its own basic block,
and has two different blocks that can execute after it: the body of the
while loop and the statement after the loop (y = x + 3
).
After the body of the while loop executes, the while loop condition
executes again, which is why there’s an edge from the basic block
representing the loop body back to the loop condition.
As we learned when deadling with abstract syntax tree, program code is naturally recursive: inside the body of the while loop, we aren’t just limited to simple statements. We can have compound statements like if statements, or evey other while loops. Here’s a second example program, and corresponding control flow graph:
Program code | Control flow graph |
---|---|
|
The subgraph contained in the red region is a subgraph
representing the loop body. If the while loop condition evaluates to
True, then the program flow goes “into” this subgraph. Notice that the
“end” of the body is represented by the basic block x += 1
in the subgraph; after this point is reached, the while loop condition
is executed again.
Finally, let’s consider how return statements are represented in our control flow graphs. As we know, when a return statement is executed, the function body immediately terminates. This means that when a basic block contains a return statement, that block must have a single edge leaving it, that goes to the special vertex representing the end of the program. Here’s an example of a control flow graph for a function:
Program code | Control flow graph |
---|---|
|
Now, control flow graphs are not merely of theoretical interest! Like abstract syntax trees, they are used as a form of static analysis when compiling program code or analysing it for errors. To wrap up this section, we’ll see how PythonTA itself uses control flow graphs to perform two different checks on your code.
Tools like PythonTA that perform static analysis on python source code rely heavily on control-flow graphs for some of its checks. Let’s look at two checks PythonTA does, and understand how it uses control-flow graphs to reason about the code.
Here is a common error that students often make when implementing a search through a list: Even though we would normally implement this using a for loop, we’re using a while loop here to make the translation to control flow graphs easier.
def has_even(numbers: list[int]) -> bool:
"""Return whether numbers has an even number.
"""
= 0
i while i < len(lst):
if lst[i] % 2 == 0:
return True
else:
return False
+= 1 i
Because the if statement inside the while loop has a return statement
in both the if and else branches, this loop will only every run for one
iteration, despite the i += 1
at the bottom of the loop
body. This means the return value will be based only on the first value
in the list, rather than checking all items (in case an even number
appears later in the
list). There’s actually another bug, when the list is empty.
We’re ignoring this bug here—one is enough!
Let’s see how this problem manifests itself in a control flow graph. Here’s the control flow graph for the above function:
If you compare this to our control flow graph of the while loop example from above, something should jump out at you: none the basic blocks for the loop body connect back to the loop condition, and instead end at the “end block”. And this is how we know that there’s only going to be one iteration of the loop!
In other words, we can reframe the question of whether a loop will always have a single iteration into a question about the corresponding control flow graph: is the basic block of the loop condition part of a cycle consisting of itself and the basic blocks representing the loop’s body? If we can find such a cycle, the loop can have more than one iteration (in at least some cases); if we can’t, then the loop is guaranteed to only execute for one iteration. This is actually the algorithm that PythonTA implements for its “one iteration” checker!
The second check we’ll study detects variables that many be used before they are defined. Consider, for example, the following Python code:
= 0
x if is_special(x):
= 5
x else:
= 5
y print(x + y)
At the last line print(x + y)
, the variable
y
might not be defined, if is_special(x)
returns True
and the if branch executes. Intuitively, a
variable is possibly undefined when it is possible to execute
the program and reach that variable being evaluated before it is
assigned.
Let’s see how we can represent this property using control flow graphs. Here’s a control flow graph for the above code.
Our intuitive notion of “reaching a variable before it is assigned”
is reflected in the control flow graph as a path from the
starting basic block to the basic block calling
print(x + y)
that doesn’t contain any assignment statements
to y
. So in general, PythonTA checks for possibly undefined
variables by searching the possible paths through the program’s control
flow graph, looking for the existence of paths where a variable is used
without an assignment statement for that variable occurring earlier in
the path. Awesome!