6.4 Linked Lists and Running Time#
To wrap up the discussion of linked lists, we return to our original motivation to studying linked lists: improving the efficiency of some of the basic list operations.
We have already discussed the running time of three operations of array-based lists:
Looking up an element of the list by its index (e.g.,
lst[i]
) takes constant time, i.e., is independent of the length of the list, or even which index we’re looking up. In the language of Big-Oh notation, we write \(O(1)\) to represent this time.Inserting or removing an element at index \(i\) (\(0 \leq i < n\)) in a list of length \(n\) takes time proportional to \(n - i\), which is the number of list elements that need to be shifted when this operation occurs. Remember that Big-Oh notation is used to describe “proportional to” relationships, and so we write that this operation takes time \(O(n - i)\).
In particular, if we only consider inserting/removing at the front of an array-based list (so \(i = 0\)), this takes time linear in the length of the list: \(O(n)\). On the other hand, if we only consider inserting/removing at the end of such a list (\(i = n\)), this is a constant time operation: \(O(1)\).[1]
Turning to linked lists#
What about the corresponding operations for LinkedList
?
Let’s study our code for LinkedList.insert
, first looking at the special cases of inserting into the front and end of a linked list.
def insert(self, index: int, item: Any) -> None:
# Create a new node
new_node = _Node(item)
# Need to do something special if we insert into the first position.
# In this case, self._first *must* be updated.
if index == 0:
new_node.next = self._first
self._first = new_node
else:
# Get the node at position (index - 1)
curr_index = 0
curr = self._first
while curr is not None and curr_index < index - 1:
curr = curr.next
curr_index = curr_index + 1
if curr is None:
raise IndexError
else:
# At this point, curr refers to the node at position (index - 1)
curr.next, new_node.next = new_node, curr.next
We insert into the front of the linked list by calling insert
with an index argument of 0.
In this case, the if
branch executes, which takes constant time—
both assignment statements do not depend on the length of the list.
On the other hand, suppose we want to insert an item at the end of the linked list,
and there’s at least one element already in the linked list.
The else
branch executes, and the loop must iterate until it reaches the end of the list,
which takes time linear in the length of the list.[2]
In other words, linked lists have the exact opposite running times as array-based lists for these two operations! Inserting into the front of a linked list takes \(O(1)\) time, and inserting into the back of a linked list takes \(O(n)\) time, where \(n\) is the length of the list.
This may seem disappointing, because now it isn’t clear which list implementation is “better.” But in fact this is pretty typical of computer science: when creating multiple implementations of a public interface, each implementation will often be good at some operations, but worse at others. In practice, it is up to the programmer who is acting as a client of the interface to decide which implementation to use, based on how they prioritize the efficiency of certain operations over others.
Investigating the subtleties of “input size”#
Despite our discussion above, we haven’t yet finished the analysis of linked list insert
.
We’ve really only looked at two special cases: when index
is 0, and when index
is the length of the linked list.
What about all the other numbers?
The very first thing we need to look at is the running of each individual line of code.
In this case, each individual line (like curr = self._first
) takes constant time, i.e., doesn’t depend on the size of the inputs.
This means that the overall running time depends on the number of lines that execute, and this in turn depends on the number of times the loop runs.
curr_index = 0
curr = self._first
while curr is not None and curr_index < index - 1:
curr = curr.next
curr_index = curr_index + 1
So how many times does the loop run? There are two possibilities for when it stops: when curr is None
, or when curr_index == index - 1
.
The first case means that the end of the list was reached, which happens after
n
iterations, wheren
is the length of the list (each iteration, thecurr
variable advances by one node).The second case means that the loop ran
index - 1
times, sincecurr_index
starts at 0 and increases by 1 per iteration.
Since the loop stops when one of the conditions is false, the number of iterations is the minimum of these two possibilities: \(min(n, index-1)\).
Since the total number of steps is proportional to the number of loop iterations, we can conclude that the asymptotic running time of this method is \(O(min(n, index - 1))\), where \(n\) is the size of self
.
But because Big-Oh notation is to simplify our running-time expressions by dropping smaller terms, we can drop the “-1” and simply write that the Big-Oh running time is \(O(min(n, index))\).
Special cases#
The Big-Oh expression \(O(min(n, index))\) for LinkedList.insert
is the most general expression we could give, since we didn’t make any assumptions about the relationship between the length of the linked list and index
.[3]
But now suppose we assume that index <= n
, which is plausible since any larger value would raise an IndexError
error.
In this case, the Big-Oh expression simplifies to just \(O(index)\), revealing that under this assumption, the running time depends only on index
and not on the length of the linked list at all.
This is a bit subtle, so let’s say this again.
We have a relationship between the running time of insert
and the sizes of two of its inputs. But we can simplify this expression by talking about the relationship between these two input sizes.
Essentially, we say that if we treat index
as small with respect to the size of the list, then the running time of the algorithm does not depend on the size of the list.[4]
On the other hand, suppose index
is greater than the size of the list; in this case, the Big-Oh expression simplifies to \(O(n)\): even though we know this value raises an IndexError
, our current implementation requires traversing the entire linked list before realizing that index
is out of bounds![5]