So far, all of the function bodies we’ve written have consisted of a sequence of statements that always execute one after the other. This kind of function body is sometimes called a “straight line program”, since the statements form a linear sequence from one to the next. But sometimes we want to execute a statement or block of statements only some of the time, based on some condition.
This is similar to the implication operator we saw when discussing
propositional logic. The implication \(p
\Rightarrow q\) states that whenever \(p\) is True, \(q\) must also be True. In Python, what we
would like to express is an instruction of the form “Whenever \(p\) is True, then the block of code
block1
must be executed”. To do so, we’ll introduce a new
type of Python statement that plays a role analogous to \(\Rightarrow\) in propositional logic.
Python uses the if statement to express conditional
execution of code. An if statement is our first example of a
compound statement, meaning it contains other
statements within
it. Analogously, a expression like 3 + 4
is a
compound expression, since it consists of smaller expressions
(3
and 4
). Here is our first syntax for
an if statement:
if <condition>:
<statement>
...else:
<statement>
...
The if statement uses two keywords, if
and
else
. Careful: we saw the if
keyword used
earlier to express conditions in comprehensions. The use of
if
here is conceptually similar, but quite different in how
Python interprets it. The <condition>
following if
must be an expression that evaluates to a
boolean, called the if condition. This expression plays
a role analogous to the hypothesis of an implication.
The statements on the lines after the if
and
else
are indented to indicate that they are part of the
if
statement, similar to how a function docstring and body
are indented relative to the function header. We call the statements
under the if
the if branch and the
statements under the else
the else
branch.
When an if
statement is executed, the following
happens:
True
, then the statements
in the if branch are executed. But if the condition evaluates to
False
, then the statements in the else branch are executed
instead.Let us consider an example. Suppose Toronto Pearson Airport (YYZ) has hired us to develop some software. The first feature they want is to show their clients if a flight is on time or delayed. The airport will provide us with both the time a flight is scheduled to depart and an estimated departure time based on the plane’s current GPS location. Our task is to report a status (as a string) to display a string. Here is the function header and docstring:
def get_status(scheduled: int, estimated: int) -> str:
"""Return the flight status for the given scheduled and estimated departure times.
The times are given as integers between 0 and 23 inclusive, representing
the hour of the day.
The status is either 'On time' or 'Delayed'.
>>> get_status(10, 10)
'On time'
>>> get_status(10, 12)
'Delayed'
"""
Now, if we only needed to compute a bool
for whether the
flight is delayed, this function would be very straightforward: simply
return estimated <= scheduled
, i.e., whether the
estimated departure time is before or at the scheduled departure time.
Boolean expressions like this are often useful first steps in
implementing functions to determine different “cases” of inputs, but
they aren’t the only step.
Instead, we use if statements to execute different code based on
these cases. Here’s our implementation of get_status
:
def get_status(scheduled: int, estimated: int) -> str:
"""..."""
if estimated <= scheduled:
return 'On time'
else:
return 'Delayed'
Our if statement uses the boolean expression we identified earlier
(estimated <= scheduled
) to trigger different
return
statements to return the correct string.
One useful tool for understanding if statements is drawing
control flow diagrams to visualize the order in which
statements execute. For example, here is a simple diagram for our
get_status
function above:
An if
statement introduces a “fork in path” of a
function’s control flow, and this is why we use the term branch
for each of the if
and else
blocks of
code.
Now suppose Toronto Pearson Airport has changed the requirements of
our feature. They’ve noticed that whenever a flight is delayed by
more than four hours, the airline cancels the flight. They
would like our get_status
function to accommodate this
change, so that the set of possible outputs is now
{'On time', 'Delayed', 'Cancelled'}
.
def get_status_v2(scheduled: int, estimated: int) -> str:
"""Return the flight status for the given scheduled and estimated departure times.
The times are given as integers between 0 and 23 inclusive, representing
the hour of the day.
The status is 'On time', 'Delayed', or 'Cancelled'.
>>> get_status_v2(10, 10)
'On time'
>>> get_status_v2(10, 12)
'Delayed'
>>> get_status_v2(10, 15)
'Cancelled'
"""
Let’s consider what’s changed between this version and our previous
one. If the estimated time is before the scheduled time, nothing’s
changed, and 'On time'
should still be returned. But when
the estimated time is after the scheduled time, we now need to
distinguish between two separate subcases, based on the difference in
time. We can express these subcases using nested if statements, i.e.,
one if statement contained in a branch of another:
def get_status_v2(scheduled: int, estimated: int) -> str:
"""..."""
if estimated <= scheduled:
return 'On time'
else:
if estimated - scheduled <= 4:
return 'Delayed'
else:
return 'Cancelled'
This function body is correct, but just like with expressions,
excessive nesting of statements can make code difficult to read and
understand. So instead of using a nested if statement, we’ll introduce a
new form of if statement that makes use of the elif
keyword, which is short for “else if”.
if <condition1>:
<statement>
...elif <condition2>:
<statement>
...# [any number of elif conditions and branches]
... else:
<statement>
...
When this form of if statement is executed, the following happens.
<condition1>
) is
evaluated, producing a boolean value.True
, then the statements
in the if branch are executed. If the condition evaluates to
False
, then next elif condition is evaluated, producing a
boolean.True
, then the
statements in that elif branch are executed. If that condition evaluates
to False
, then the next elif condition is evaluated. This
step repeats until either one of the elif conditions evaluate to
True
, or all of the elif conditions have evaluated to
False
.True
, then the else branch executes.Here is how we can use elif
to rewrite
get_status
without nested if statements.
def get_status_v3(scheduled: int, estimated: int) -> str:
"""Return the flight status for the given scheduled and estimated departure times.
The times are given as integers between 0 and 23 inclusive, representing
the hour of the day.
The status is 'On time', 'Delayed', or 'Cancelled'.
>>> get_status_v3(10, 10)
'On time'
>>> get_status_v3(10, 12)
'Delayed'
>>> get_status_v3(10, 15)
'Cancelled'
"""
if estimated <= scheduled:
return 'On time'
elif estimated - scheduled <= 4:
return 'Delayed'
else:
return 'Cancelled'
This code is logically equivalent to the previous version, but it’s easier to read because there’s no more nesting! In this version, the visual structure makes clear the exact three possible paths of execution for this function.
Adding branching to our control flow makes our functions more complex, and so we need to pay attention to how we test our code. With functions that contain if statements, any one particular input we give can only test one possible execution path, so we need to design our unit tests so that each possible execution path is used at least once. This form of test design is called white box testing, because we “see through the box” and therefore can design tests based on the source code itself. In contrast, black box testing are tests created without any knowledge of the source code (so no knowledge of the different paths the code can take).
In our doctests for get_status_v3
, we chose three
different examples, each corresponding to a different possible case of
the if statement. This was pretty straightforward because the code is
relatively simple, but we’ll study later example of more complex control
flow where it won’t be so simple to design test cases to cover each
branch. In fact, the percentage of lines of program code that are
executed when a set of tests for that program is called code
coverage, and is a metric used to assess the quality of tests.
While a set of tests may strive for 100% code coverage, this does not
always occur as our programs grow in complexity. The concept of code
coverage and other metrics used to evaluate tests is something we’ll
only touch on in this course, but in future courses you’ll learn about
this in more detail and even use some automated tools for calculating
these
metrics. In particular, even though code coverage is a commonly
used metric, it is also criticized for giving a false sense of quality
of a test suite. Just because all lines of code are executed at least
once does not actually mean that the tests chosen cover all possible
cases to consider for a program. We’ll see a simple example of this in
the following section.
Toronto Pearson Airport is beginning to trust us with more data, and
are requesting more complex features as a result. They now want us to
write a function that determines how many flights are cancelled in a
day. The airport will provide us with the data as a dictionary (i.e.,
dict
), where the keys are unique flight numbers and the
values for each flight number is a two-element list. The first element
is the scheduled time and the second element is the estimated time. More
succinctly, the data is a mapping of the form:
{ flight_number: [scheduled, estimated] }
.
Unlike earlier, when our function input was only two integers, we are now working with a collection of data. Before we start trying to solve the problem, let’s create some example data in the Python console. Specifically, we’ll create a dictionary with records for three different Air Canada flight numbers.
>>> flights = {'AC110': [10, 12], 'AC321': [12, 19], 'AC999': [1, 1]}
We know that in this dictionary, each key is a string whose
associated value is a list of integers, and we can index the list to
retrieve the integers representing the individual times. Specifically,
in each list index 0 refers to the flight’s scheduled time and index 1
refers to the estimated time. Since this data has some nested elements,
we’ll demo how to take our flights
dictionary and extract
the relevant data step by step, until we can call
get_status_v3
on the individual parts:
>>> flights['AC110'] # Produce the list associated with flight 'AC110'
10, 12]
[>>> flights['AC110'][0] # Produce just the scheduled time for flight 'AC110'
10
>>> flights['AC110'][1] # Produce just the estimated time for flight 'AC110'
12
>>> get_status_v3(flights['AC110'][0], flights['AC110'][1]) # Call get_status_v3 on the two numbers
'Delayed'
We’re making great progress! Instead of specifying the flight number
ourselves (i.e., 'AC110'
), we would instead like to
substitute in different flight numbers based on the data we receive from
the airport. We can do that using comprehensions. Let’s explore and see
what we can get:
>>> [k for k in flights]
'AC110', 'AC321', 'AC999'] # Produces each *key* (flight number) in flights
[
>>> [flights[k] for k in flights]
10, 12], [12, 19], [1, 1]] # Produces each *associated value* in flights
[[
>>> [get_status_v3(flights[k][0], flights[k][1]) for k in flights]
'Delayed', 'Cancelled', 'On time'] # Produces each *status* of the flights
[
>>> [get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled' for k in flights]
False, True, False] # Produces whether each flight was "Cancelled" [
This is a really good example of building up a complex
comprehension step by step. We started with just the keys of the
dictionary flights
, and from there accessed the
corresponding values, then used our previous work to calculate the
status of each flight! And finally, we were able to compute a list of
booleans telling us whether each flight was cancelled or not. But
remember that our bosses at Pearson International Airport only want to
know how many flights were cancelled: a single integer value. Currently,
we have a list of boolean values. Of course we can simply count the
flights in this case because there’s only three values in total, but we
want a computational solution!
There are many ways of producing a single integer from this work
using what we’ve previously learned. One approach is to modify our
comprehension to perform a filtering computation, moving the
get_status_v3(...) == 'Cancelled'
part into the condition
of the comprehension:
>>> [k for k in flights if get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled']
'AC321'] [
We now have a list of flight numbers that were cancelled. To convert
this into an integer, we can call the built-in len
function
on the list.
>>> cancelled_flights = [k for k in flights if get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled']
>>> len(cancelled_flights)
1
Excellent! So are we done? Well, we’ve certainly answered the
question for this one particular dictionary flights
, but
we’d like to generalize this approach to work on any dictionary
with this structure. To do so, we can define a function to perform this
computation. Here is our final result:
def count_cancelled(flights: dict) -> int:
"""Return the number of cancelled flights for the given flight data.
flights is a dictionary where each key is a flight number,
and whose corresponding value is a list of two numbers, where the first is
the scheduled departure time and the second is the estimated departure time.
>>> count_cancelled({'AC110': [10, 12], 'AC321': [12, 19], 'AC999': [1, 1]})
1
"""
= [k for k in flights
cancelled_flights if get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled']
return len(cancelled_flights)
Let’s review what we learned in this example:
get_status_v3
) to help implement other functions.