4.4 Linked Lists and Running Time

To wrap up the discussion of linked lists, we return to our original motivation to studying linked lists: improving the efficiency of some of the basic list operations.

We have already discussed the running time of three operations of array-based lists:

Looking up an element of the list by its index (e.g., lst[i]) takes constant time, i.e., is independent of the length of the list, or even which index we’re looking up. In the language of Big-Oh notation, we write O(1) to represent this time.
Inserting or removing an element at index i (0 ≤ i < n) in a list of length n takes time proportional to n − i, which is the number of list elements that need to be shifted when this operation occurs. Remember that Big-Oh notation is used to describe “proportional to” relationships, and so we write that this operation takes time O(n − i).

In particular, if we only consider inserting/removing at the front of an array-based list (so i = 0), this takes time linear in the length of the list: O(n). On the other hand, if we only consider inserting/removing at the end of such a list (i = n), this is a constant time operation: O(1). You might note that mathematically, n − i = 0 if i = n. However, every operation takes at least one step to run, and so there’s an implicit "max(1, ___)" whenever we write a Big-Oh expression to capture the fact that the running time can’t drop below 1.

Turning to linked lists

What about the corresponding operations for LinkedList? Let’s study our code for LinkedList.insert, first looking at the special cases of inserting into the front and end of a linked list.

def insert(self, index: int, item: Any) -> None:
    # Create a new node
    new_node = _Node(item)

    # Need to do something special if we insert into the first position.
    # In this case, self._first *must* be updated.
    if index == 0:
        new_node.next = self._first
        self._first = new_node
    else:
        # Get the node at position (index - 1)
        curr_index = 0
        curr = self._first
        while curr is not None and curr_index < index - 1:
            curr = curr.next
            curr_index = curr_index + 1

        if curr is None:
            raise IndexError
        else:
            # At this point, curr refers to the node at position (index - 1)
            curr.next, new_node.next = new_node, curr.next

We insert into the front of the linked list by calling insert with an index argument of 0. In this case, the if branch executes, which takes constant time— both assignment statements do not depend on the length of the list.

On the other hand, suppose we want to insert an item at the end of the linked list, and there’s at least one element already in the linked list. The else branch executes, and the loop must iterate until it reaches the end of the list, which takes time linear in the length of the list. Note that the body of the loop, which again consists of assignment statements, takes constant time.

In other words, linked lists have the exact opposite running times as array-based lists for these two operations! Inserting into the front of a linked list takes O(1) time, and inserting into the back of a linked list takes O(n) time, where n is the length of the list.

This may seem disappointing, because now it isn’t clear which list implementation is “better.” But in fact this is pretty typical of computer science: when creating multiple implementations of a public interface, each implementation will often be good at some operations, but worse at others. In practice, it is up to the programmer who is acting as a client of the interface to decide which implementation to use, based on how they prioritize the efficiency of certain operations over others.

Investigating the subtleties of “input size”

Despite our discussion above, we haven’t yet finished the analysis of linked list insert. We’ve really only looked at two special cases: when index is 0, and when index is the length of the linked list. What about all the other numbers?

The very first thing we need to look at is the running of each individual line of code. In this case, each individual line (like curr = self._first) takes constant time, i.e., doesn’t depend on the size of the inputs. This means that the overall running time depends on the number of lines that execute, and this in turn depends on the number of times the loop runs.

curr_index = 0
curr = self._first
while curr is not None and curr_index < index - 1:
    curr = curr.next
    curr_index = curr_index + 1

So how many times does the loop run? There are two possibilities for when it stops: when curr is None, or when curr_index == index - 1.

The first case means that the end of the list was reached, which happens after n iterations, where n is the length of the list (each iteration, the curr variable advances by one node).
The second case means that the loop ran index - 1 times, since curr_index starts at 0 and increases by 1 per iteration.

Since the loop stops when one of the conditions is false, the number of iterations is the minimum of these two possibilities: min(n, index − 1).

Since the total number of steps is proportional to the number of loop iterations, we can conclude that the asymptotic running time of this method is O(min(n, index − 1)), where n is the size of self. But because Big-Oh notation is to simplify our running-time expressions by dropping smaller terms, we can drop the “-1” and simply write that the Big-Oh running time is O(min(n, index)).

Special cases

The Big-Oh expression O(min(n, index)) for LinkedList.insert is the most general expression we could give, since we didn’t make any assumptions about the relationship between the length of the linked list and index. Although we are assuming that index >= 0 here!

But now suppose we assume that index <= n, which is plausible since any larger value would raise an IndexError error. In this case, the Big-Oh expression simplifies to just O(index), revealing that under this assumption, the running time depends only on index and not on the length of the linked list at all.

This is a bit subtle, so let’s say this again. We have a relationship between the running time of insert and the sizes of two of its inputs. But we can simplify this expression by talking about the relationship between these two input sizes. Essentially, we say that if we treat index as small with respect to the size of the list, then the running time of the algorithm does not depend on the size of the list. The most extreme case of this is when index == 0, so we’re inserting into the front of the linked list. As we discussed earlier, this takes constant time, meaning it does not depend on the length of the list.

On the other hand, suppose index is greater than the size of the list; in this case, the Big-Oh expression simplifies to O(n): even though we know this value raises an IndexError, our current implementation requires traversing the entire linked list before realizing that index is out of bounds! Can you find a way to fix this problem efficiently?