\( \newcommand{\NOT}{\neg} \newcommand{\AND}{\wedge} \newcommand{\OR}{\vee} \newcommand{\XOR}{\oplus} \newcommand{\IMP}{\Rightarrow} \newcommand{\IFF}{\Leftrightarrow} \newcommand{\TRUE}{\text{True}\xspace} \newcommand{\FALSE}{\text{False}\xspace} \newcommand{\IN}{\,{\in}\,} \newcommand{\NOTIN}{\,{\notin}\,} \newcommand{\TO}{\rightarrow} \newcommand{\DIV}{\mid} \newcommand{\NDIV}{\nmid} \newcommand{\MOD}[1]{\pmod{#1}} \newcommand{\MODS}[1]{\ (\text{mod}\ #1)} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cL}{\mathcal L} \newcommand{\cK}{\mathcal K} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cZ}{\mathcal Z} \newcommand{\emp}{\emptyset} \newcommand{\bs}{\backslash} \newcommand{\floor}[1]{\left \lfloor #1 \right \rfloor} \newcommand{\ceil}[1]{\left \lceil #1 \right \rceil} \newcommand{\abs}[1]{\left | #1 \right |} \newcommand{\xspace}{} \newcommand{\proofheader}[1]{\underline{\textbf{#1}}} \)

3.4 If Statements: Conditional Code Execution

So far, all of the function bodies we’ve written have consisted of a sequence of statements that always execute one after the other. This kind of function body is sometimes called a “straight line program”, since the statements form a linear sequence from one to the next. But sometimes we want to execute a statement or block of statements only some of the time, based on some condition.

This is similar to the implication operator we saw when discussing propositional logic. The implication \(p \Rightarrow q\) states that whenever \(p\) is True, \(q\) must also be True. In Python, what we would like to express is an instruction of the form “Whenever \(p\) is True, then the block of code block1 must be executed”. To do so, we’ll introduce a new type of Python statement that plays a role analogous to \(\Rightarrow\) in propositional logic.

The if statement

Python uses the if statement to express conditional execution of code. An if statement is our first example of a compound statement, meaning it contains other statements within it. Analogously, a expression like 3 + 4 is a compound expression, since it consists of smaller expressions (3 and 4). Here is our first syntax for an if statement:

if <condition>:
    <statement>
    ...
else:
    <statement>
    ...

The if statement uses two keywords, if and else. Careful: we saw the if keyword used earlier to express conditions in comprehensions. The use of if here is conceptually similar, but quite different in how Python interprets it. The <condition> following if must be an expression that evaluates to a boolean, called the if condition. This expression plays a role analogous to the hypothesis of an implication.

The statements on the lines after the if and else are indented to indicate that they are part of the if statement, similar to how a function docstring and body are indented relative to the function header. We call the statements under the if the if branch and the statements under the else the else branch.

When an if statement is executed, the following happens:

  1. First, the if condition is evaluated, producing a boolean value.
  2. If the condition evaluates to True, then the statements in the if branch are executed. But if the condition evaluates to False, then the statements in the else branch are executed instead.

Example: reporting flight status

Photo of airport flight status display. Photo by LT Chan: https://www.pexels.com/photo/flight-schedule-screen-turned-on-2833379/

Let us consider an example. Suppose Toronto Pearson Airport (YYZ) has hired us to develop some software. The first feature they want is to show their clients if a flight is on time or delayed. The airport will provide us with both the time a flight is scheduled to depart and an estimated departure time based on the plane’s current GPS location. Our task is to report a status (as a string) to display a string. Here is the function header and docstring:

def get_status(scheduled: int, estimated: int) -> str:
    """Return the flight status for the given scheduled and estimated departure times.

    The times are given as integers between 0 and 23 inclusive, representing
    the hour of the day.

    The status is either 'On time' or 'Delayed'.

    >>> get_status(10, 10)
    'On time'
    >>> get_status(10, 12)
    'Delayed'
    """

Now, if we only needed to compute a bool for whether the flight is delayed, this function would be very straightforward: simply return estimated <= scheduled, i.e., whether the estimated departure time is before or at the scheduled departure time. Boolean expressions like this are often useful first steps in implementing functions to determine different “cases” of inputs, but they aren’t the only step.

Instead, we use if statements to execute different code based on these cases. Here’s our implementation of get_status:

def get_status(scheduled: int, estimated: int) -> str:
    """..."""
    if estimated <= scheduled:
        return 'On time'
    else:
        return 'Delayed'

Our if statement uses the boolean expression we identified earlier (estimated <= scheduled) to trigger different return statements to return the correct string.

A simple control flow diagram

One useful tool for understanding if statements is drawing control flow diagrams to visualize the order in which statements execute. For example, here is a simple diagram for our get_status function above:

get status control flow diagram

An if statement introduces a “fork in path” of a function’s control flow, and this is why we use the term branch for each of the if and else blocks of code.

Code with more than two cases

Now suppose Toronto Pearson Airport has changed the requirements of our feature. They’ve noticed that whenever a flight is delayed by more than four hours, the airline cancels the flight. They would like our get_status function to accommodate this change, so that the set of possible outputs is now {'On time', 'Delayed', 'Cancelled'}.

def get_status_v2(scheduled: int, estimated: int) -> str:
    """Return the flight status for the given scheduled and estimated departure times.

    The times are given as integers between 0 and 23 inclusive, representing
    the hour of the day.

    The status is 'On time', 'Delayed', or 'Cancelled'.

    >>> get_status_v2(10, 10)
    'On time'
    >>> get_status_v2(10, 12)
    'Delayed'
    >>> get_status_v2(10, 15)
    'Cancelled'
    """

Let’s consider what’s changed between this version and our previous one. If the estimated time is before the scheduled time, nothing’s changed, and 'On time' should still be returned. But when the estimated time is after the scheduled time, we now need to distinguish between two separate subcases, based on the difference in time. We can express these subcases using nested if statements, i.e., one if statement contained in a branch of another:

def get_status_v2(scheduled: int, estimated: int) -> str:
    """..."""
    if estimated <= scheduled:
        return 'On time'
    else:
        if estimated - scheduled <= 4:
            return 'Delayed'
        else:
            return 'Cancelled'

This function body is correct, but just like with expressions, excessive nesting of statements can make code difficult to read and understand. So instead of using a nested if statement, we’ll introduce a new form of if statement that makes use of the elif keyword, which is short for “else if”.

if <condition1>:
    <statement>
    ...
elif <condition2>:
    <statement>
    ...
... # [any number of elif conditions and branches]
else:
    <statement>
    ...

When this form of if statement is executed, the following happens.

  1. First, the if condition (<condition1>) is evaluated, producing a boolean value.
  2. If the condition evaluates to True, then the statements in the if branch are executed. If the condition evaluates to False, then next elif condition is evaluated, producing a boolean.
  3. If that condition evaluates to True, then the statements in that elif branch are executed. If that condition evaluates to False, then the next elif condition is evaluated. This step repeats until either one of the elif conditions evaluate to True, or all of the elif conditions have evaluated to False.
  4. If neither the if condition nor any of the elif conditions evaluate to True, then the else branch executes.

Here is how we can use elif to rewrite get_status without nested if statements.

def get_status_v3(scheduled: int, estimated: int) -> str:
    """Return the flight status for the given scheduled and estimated departure times.

    The times are given as integers between 0 and 23 inclusive, representing
    the hour of the day.

    The status is 'On time', 'Delayed', or 'Cancelled'.

    >>> get_status_v3(10, 10)
    'On time'
    >>> get_status_v3(10, 12)
    'Delayed'
    >>> get_status_v3(10, 15)
    'Cancelled'
    """
    if estimated <= scheduled:
        return 'On time'
    elif estimated - scheduled <= 4:
        return 'Delayed'
    else:
        return 'Cancelled'

This code is logically equivalent to the previous version, but it’s easier to read because there’s no more nesting! In this version, the visual structure makes clear the exact three possible paths of execution for this function.

Testing all the branches

Adding branching to our control flow makes our functions more complex, and so we need to pay attention to how we test our code. With functions that contain if statements, any one particular input we give can only test one possible execution path, so we need to design our unit tests so that each possible execution path is used at least once. This form of test design is called white box testing, because we “see through the box” and therefore can design tests based on the source code itself. In contrast, black box testing are tests created without any knowledge of the source code (so no knowledge of the different paths the code can take).

In our doctests for get_status_v3, we chose three different examples, each corresponding to a different possible case of the if statement. This was pretty straightforward because the code is relatively simple, but we’ll study later example of more complex control flow where it won’t be so simple to design test cases to cover each branch. In fact, the percentage of lines of program code that are executed when a set of tests for that program is called code coverage, and is a metric used to assess the quality of tests. While a set of tests may strive for 100% code coverage, this does not always occur as our programs grow in complexity. The concept of code coverage and other metrics used to evaluate tests is something we’ll only touch on in this course, but in future courses you’ll learn about this in more detail and even use some automated tools for calculating these metrics. In particular, even though code coverage is a commonly used metric, it is also criticized for giving a false sense of quality of a test suite. Just because all lines of code are executed at least once does not actually mean that the tests chosen cover all possible cases to consider for a program. We’ll see a simple example of this in the following section.

Extending our example: from one flight to many

Toronto Pearson Airport is beginning to trust us with more data, and are requesting more complex features as a result. They now want us to write a function that determines how many flights are cancelled in a day. The airport will provide us with the data as a dictionary (i.e., dict), where the keys are unique flight numbers and the values for each flight number is a two-element list. The first element is the scheduled time and the second element is the estimated time. More succinctly, the data is a mapping of the form: { flight_number: [scheduled, estimated] }.

Unlike earlier, when our function input was only two integers, we are now working with a collection of data. Before we start trying to solve the problem, let’s create some example data in the Python console. Specifically, we’ll create a dictionary with records for three different Air Canada flight numbers.

>>> flights = {'AC110': [10, 12], 'AC321': [12, 19], 'AC999': [1, 1]}

We know that in this dictionary, each key is a string whose associated value is a list of integers, and we can index the list to retrieve the integers representing the individual times. Specifically, in each list index 0 refers to the flight’s scheduled time and index 1 refers to the estimated time. Since this data has some nested elements, we’ll demo how to take our flights dictionary and extract the relevant data step by step, until we can call get_status_v3 on the individual parts:

>>> flights['AC110']  # Produce the list associated with flight 'AC110'
[10, 12]
>>> flights['AC110'][0]  # Produce just the scheduled time for flight 'AC110'
10
>>> flights['AC110'][1]  # Produce just the estimated time for flight 'AC110'
12
>>> get_status_v3(flights['AC110'][0], flights['AC110'][1])  # Call get_status_v3 on the two numbers
'Delayed'

We’re making great progress! Instead of specifying the flight number ourselves (i.e., 'AC110'), we would instead like to substitute in different flight numbers based on the data we receive from the airport. We can do that using comprehensions. Let’s explore and see what we can get:

>>> [k for k in flights]
['AC110', 'AC321', 'AC999']          # Produces each *key* (flight number) in flights

>>> [flights[k] for k in flights]
[[10, 12], [12, 19], [1, 1]]         # Produces each *associated value* in flights

>>> [get_status_v3(flights[k][0], flights[k][1]) for k in flights]
['Delayed', 'Cancelled', 'On time']  # Produces each *status* of the flights

>>> [get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled' for k in flights]
[False, True, False]                 # Produces whether each flight was "Cancelled"

This is a really good example of building up a complex comprehension step by step. We started with just the keys of the dictionary flights, and from there accessed the corresponding values, then used our previous work to calculate the status of each flight! And finally, we were able to compute a list of booleans telling us whether each flight was cancelled or not. But remember that our bosses at Pearson International Airport only want to know how many flights were cancelled: a single integer value. Currently, we have a list of boolean values. Of course we can simply count the flights in this case because there’s only three values in total, but we want a computational solution!

There are many ways of producing a single integer from this work using what we’ve previously learned. One approach is to modify our comprehension to perform a filtering computation, moving the get_status_v3(...) == 'Cancelled' part into the condition of the comprehension:

>>> [k for k in flights if get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled']
['AC321']

We now have a list of flight numbers that were cancelled. To convert this into an integer, we can call the built-in len function on the list.

>>> cancelled_flights = [k for k in flights if get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled']
>>> len(cancelled_flights)
1

Excellent! So are we done? Well, we’ve certainly answered the question for this one particular dictionary flights, but we’d like to generalize this approach to work on any dictionary with this structure. To do so, we can define a function to perform this computation. Here is our final result:

def count_cancelled(flights: dict) -> int:
    """Return the number of cancelled flights for the given flight data.

    flights is a dictionary where each key is a flight number,
    and whose corresponding value is a list of two numbers, where the first is
    the scheduled departure time and the second is the estimated departure time.

    >>> count_cancelled({'AC110': [10, 12], 'AC321': [12, 19], 'AC999': [1, 1]})
    1
    """
    cancelled_flights = [k for k in flights
                         if get_status_v3(flights[k][0], flights[k][1]) == 'Cancelled']
    return len(cancelled_flights)

Let’s review what we learned in this example: