6.3 Linked List Mutation#
All of the linked list methods we have looked at so far are non-mutating, meaning they did not change the linked list upon which they were called.
Here’s a reminder of the basic traversal pattern using a while
loop: understanding this is critical before moving on!
# self is a LinkedList
curr = self._first
while curr is not None:
... curr.item ...
curr = curr.next
We started with these methods because they’re generally easier to understand than their mutating cousins. Now, we’re going to look at the two major mutating operations on linked lists: inserting into and deleting items from a linked list. We’ll start with the simplest version of this: appending a new item to the end of a linked list. Before we start, let’s remind ourselves how this works for built-in lists:
>>> lst = []
>>> lst.append(1)
>>> lst.append(2)
>>> lst.append(3)
>>> lst
[1, 2, 3]
Linked list append
#
class LinkedList:
def append(self, item: Any) -> None:
"""Add the given item to the end of this linked list."""
Recall that a LinkedList
object has only one attribute, a reference to the first node in the list.
Unfortunately, this means that we have some work to do to implement append
: before adding the item, we need to find the currently last node in the linked list, and then add the item to the end of that.
Let’s start (as recommended!!) by using our basic code template:
def append(self, item: Any) -> None:
"""Add the given item to the end of this linked list."""
curr = self._first
while curr is not None:
... curr.item ...
curr = curr.next
This template is a good start, but now our thinking must begin.
First: what do we do with curr.item
?
The answer is “Nothing!”—we don’t need to actually use any of the existing items in the list, and instead are just going through the list to get to the last node.
Unfortunately, there’s a problem with the loop: this loop is designed to keep going until we’ve processed all of the elements of the list, and curr
becomes None
.
But this is actually going too far for our purposes: we want to stop the loop as soon as we reach the last node.[1]
We modify our loop condition to check whether the current node is the last one by using curr.next is None
instead.
def append(self, item: Any) -> None:
"""Add the given item to the end of this linked list."""
curr = self._first
while curr.next is not None:
curr = curr.next
# After the loop, curr is the last node in the LinkedList.
# assert curr is not None and curr.next is None
At this point, the astute reader will point out a flaw in this change: we aren’t guaranteed that curr
starts off as a node—it could be None
.
But because we don’t want to get bogged down with handling that case right now, we’ll add a TODO comment in our code and keep going.
def append(self, item: Any) -> None:
"""Add the given item to the end of this linked list."""
curr = self._first
while curr.next is not None: # TODO: what if curr starts off as None?
curr = curr.next
# After the loop, curr is the last node in the LinkedList.
# assert curr is not None and curr.next is None
So then after the loop ends, we know that curr
refers to the last node in the linked list, and we are finally in a position to add the given item to the linked list.
To do so, we need to create a new node and then connect it in.
def append(self, item: Any) -> None:
"""Add the given item to the end of this linked list."""
curr = self._first
while curr.next is not None: # TODO: what is curr starts off as None?
curr = curr.next
# After the loop, curr is the last node in the LinkedList.
# assert curr is not None and curr.next is None
new_node = _Node(item)
curr.next = new_node
And finally, let’s handle that TODO.
We know from the documentation of our LinkedList
class that self._first
can only be None
if self
refers to an empty linked list.
But in this case, all we need to do is add the new item to be the first item in the linked list.
def append(self, item: Any) -> None:
"""Add the given item to the end of this linked list."""
curr = self._first
if curr is None:
new_node = _Node(item)
self._first = new_node
else:
while curr.next is not None:
curr = curr.next
# After the loop, curr is the last node in the LinkedList.
# assert curr is not None and curr.next is None
new_node = _Node(item)
curr.next = new_node
Example: a more general initializer#
With our append
method in place, we can now stop creating linked lists by manually fiddling with attributes, and instead modify our linked list initializer to take in a list of values, which we’ll then append one at a time:
class LinkedList:
def __init__(self, items: list) -> None:
"""Initialize a new linked list containing the given items.
The first node in the linked list contains the first item
in <items>.
"""
self._first = None
for item in items:
self.append(item)
While this code is perfectly correct, it turns out that it is rather inefficient; we’ll leave it as an exercise for now to develop a better approach.
Index-based insertion#
Now suppose we want to implement a more general form of insertion that allows the user to specify the index of the list to insert a new item into (analogous to the built-in list.insert
method):
class LinkedList:
def insert(self, index: int, item: Any) -> None:
"""Insert a new node containing item at position <index>.
Precondition: index >= 0.
Raise IndexError if index > len(self).
Note: if index == len(self), this method adds the item to the end
of the linked list, which is the same as LinkedList.append.
>>> lst = LinkedList([1, 2, 10, 200])
>>> lst.insert(2, 300)
>>> str(lst)
'[1 -> 2 -> 300 -> 10 -> 200]'
>>> lst.insert(5, -1)
>>> str(lst)
'[1 -> 2 -> 300 -> 10 -> 200 -> -1]'
"""
As with append
, our first step is to traverse the list until we reach the correct index; and if we want the node to be inserted into position index
, we need to access the node at position (index-1)
!
To write the code, we need to modify our code template to store not just the current node, but the current index of that node as well:
def insert(self, index: int, item: Any) -> None:
curr = self._first
curr_index = 0
while curr is not None and curr_index < index - 1:
curr = curr.next
curr_index += 1
This loop condition is a bit more complicated, so it’s worth spending some time to unpack.
Here, we’re saying that the loop should keep going when the current node is not None
and when the current index is less than our target index (index - 1
).
This means that when the loop is over, the current node is None
or the current index has reached the target index (or both!).
We therefore need to structure our code into two cases, and handle each one separately:
def insert(self, index: int, item: Any) -> None:
curr = self._first
curr_index = 0
while curr is not None and curr_index < index - 1:
curr = curr.next
curr_index += 1
# assert curr is None or curr_index == index - 1
if curr is None:
pass
else: # curr_index == index - 1
pass
Now, if curr
is None
then the list doesn’t have a node at position index - 1
, and so that index is out of bounds.
In this case, we should raise an IndexError
.
On the other hand, if curr
is not None
, then we’ve reached the desired index, and can insert the new node using the same strategy as append
.
def insert(self, index: int, item: Any) -> None:
curr = self._first
curr_index = 0
while curr is not None and curr_index < index - 1:
curr = curr.next
curr_index += 1
# assert curr is None or curr_index == index - 1
if curr is None:
# index - 1 is out of bounds. The item cannot be inserted.
raise IndexError
else: # curr_index == index - 1
# index - 1 is in bounds. Insert the new item.
new_node = _Node(item)
curr.next = new_node # Hmm...
Well, almost. The problem with the last else
branch is that unlike append
, curr
might have had other nodes after it!
Simply setting curr.next = new_node
loses the reference to the old node at position index
, and any subsequent nodes after that one.
So before overwriting curr.next
, we need to update new_node
so that it refers to the old node at position index
:
def insert(self, index: int, item: object) -> None:
curr = self._first
curr_index = 0
while curr is not None and curr_index < index - 1:
curr = curr.next
curr_index += 1
# assert curr is None or curr_index == index - 1
if curr is None:
# index - 1 is out of bounds. The item cannot be inserted.
raise IndexError
else: # curr_index == index - 1
# index - 1 is in bounds. Insert the new item.
new_node = _Node(item)
new_node.next = curr.next # THIS LINE IS IMPORTANT!
curr.next = new_node
Warning! Common error ahead! (and solution)#
When writing mutating methods on linked lists, we very often update the links of individual nodes to add and remove nodes in the list. We must be very careful when doing so, because the order in which we update the links really matters, and often only one order results in the correct behaviour.
For example, this order of link updates in the final else
branch doesn’t work:
curr.next = new_node
new_node.next = curr.next
On the second line, curr.next
has already been updated, and its old value lost. The second line is now equivalent to writing new_node.next = new_node
, which is certainly not what we want!
The reason this type of error is so insidious is that the code looks very similar to the correct code (only the order of lines is different), and so you can only detect it by carefully tracing through the updates of the links line-by-line.
To mitigate this problem, we’ll take advantage of a pretty nice Python feature known as multiple (or simultaneous) assignment:
a, b = 1, 2 # Assigns 1 to a and 2 to b
The beauty of this approach is that the expressions on the right side are all evaluated before any new values are assigned, meaning that you don’t need to worry about the order in which you write them. For example, these two assignment statements are equivalent:
# Version 1
curr.next, new_node.next = new_node, curr.next
# Version 2
new_node.next, curr.next = curr.next, new_node
In other words, using multiple assignment in this linked list method allows us to ignore the tricky business about the order in which the link updates happen! We strongly recommend using multiple assignment in your own code when working with complex state updating.
Tidying up: don’t forget about corner cases!#
Our insert
implementation has one problem: what if index = 0
?
In this case, it doesn’t make sense to iterate to the (index-1
)-th node!
This is again a special case which we need to handle separately, by modifying self._first
(because in this case, we’re inserting into the front of a linked list):
def insert(self, index: int, item: Any) -> None:
if index == 0:
new_node = _Node(item)
self._first, new_node.next = new_node, self._first
else:
curr = self._first
curr_index = 0
while curr is not None and curr_index < index - 1:
curr = curr.next
curr_index += 1
# assert curr is None or curr_index == index - 1
if curr is None:
# index - 1 is out of bounds. The item cannot be inserted.
raise IndexError
else: # curr_index == index - 1
# index - 1 is in bounds. Insert the new item.
new_node = _Node(item)
curr.next, new_node.next = new_node, curr.next
Exercise: Index-based deletion#
The analogue of Python’s list.append
is list.pop
, which allows the user to remove an item at a specified index in a list.
Because this is quite similar to insertion, we won’t develop the full code here,
but instead outline the basic steps in some pseudo-code:
class LinkedList:
def pop(self, index: int) -> Any:
"""Remove and return node at position <index>.
Precondition: index >= 0.
Raise IndexError if index >= len(self).
>>> lst = LinkedList([1, 2, 10, 200])
>>> lst.pop(2)
10
>>> lst.pop(0)
1
"""
# Warning: the following is pseudo-code, not valid Python code!
# 1. If the list is empty, you know for sure that index is out of bounds...
# 2. Else if index is 0, remove the first node and return its item.
# 3. Else iterate to the (index-1)-th node and update links to remove
# the node at position index. But don't forget to return the item!