8.5 Binary Search Tree Implementation and Search#
Our implementation of a BinarySearchTree
class is heavily based on Tree
,
but with a few important differences.
First, because we know there are only two subtrees, and the left/right ordering matters,
we use explicit attributes to refer to the left and right subtrees:
class BinarySearchTree:
"""Binary Search Tree class.
This class represents a binary tree satisfying the Binary Search Tree
property: for every node, its value is >= all items stored in its left
subtree, and <= all items stored in its right subtree.
Private Instance Attributes:
- _root: The item stored at the root of the tree, or None if
the tree is empty.
- _left: The left subtree, or None if the tree is empty.
- _right: The right subtree, or None if the tree is empty.
"""
_root: Any | None
_left: BinarySearchTree | None
_right: BinarySearchTree | None
Another difference between BinarySearchTree
and Tree
is in
how we distinguish between empty and non-empty trees.
In the Tree
class, an empty tree has a _root
value of None
, and an empty list []
for its list of subtrees.
In the BinarySearchTree
class, an empty tree also has a _root
value of None
, but its _left
and _right
attributes are set to None
as well.
Moreover, for BinarySearchTree
, an empty tree is the only case where any of the attributes can be None
; when we represent a non-empty tree, we do so by storing the root item (which isn’t None
) at the root, and storing BinarySearchTree
objects in the _left
and _right
attributes.
The attributes _left
and _right
might refer to empty binary search trees, but this is different from them being None
!
Any method we add to the BinarySearchTree
class (a) can rely upon these properties,
and (b) must maintain these properties, since the other methods rely upon them.
This is so important that we document them in our representation invariants,
along with the BST property itself.
class BinarySearchTree:
"""...
Representation Invariants:
- If self._root is None, then so are self._left and self._right.
This represents an empty BST.
- If self._root is not None, then self._left and self._right are BinarySearchTrees.
This represents a non-empty BST.
- (BST Property) If self is non-empty, all items in self._left are <= self._root,
and all items in self._right are >= self._root.
"""
Here are the initializer and is_empty
methods for this class, which is based on the corresponding methods for the Tree
class:
def __init__(self, root: Any | None) -> None:
"""Initialize a new BST containing only the given root value.
If <root> is None, initialize an empty BST.
"""
if root is None:
self._root = None
self._left = None
self._right = None
else:
self._root = root
self._left = BinarySearchTree(None) # self._left is an empty BST
self._right = BinarySearchTree(None) # self._right is an empty BST
def is_empty(self) -> bool:
"""Return whether this BST is empty.
"""
return self._root is None
Note that we do not allow client code to pass in left and right subtrees as parameters to the initializer. This is because binary search trees have a much stronger restriction on where values can be located in the tree, and so a separate method is used to insert new values into the tree that will ensure the BST property is always satisfied.
But before we get to the BST mutating methods (inserting and deleting items), we’ll first study the most important BST non-mutating method: searching for an item.
Searching a binary search tree#
Recall that the key insight of the binary search algorithm in a sorted list is that by comparing the target item with the middle of the list, we can immediately cut in half the remaining items to be searched. An analogous idea holds for BSTs.
For general trees, the standard search algorithm is to compare the item
against the root, and then search in each of the subtrees until either the item is found, or all the subtrees have been searched.
When item
is not in the tree, every item must be searched.
In stark contrast, for BSTs the initial comparison to the root tells you which subtree you need to check. That is, only one recursive call needs to be made, rather than two!
def __contains__(self, item: Any) -> bool:
"""Return whether <item> is in this BST.
"""
if self.is_empty():
return False
else:
if item == self._root:
return True
elif item < self._root:
return item in self._left # or, self._left.__contains__(item)
else:
return item in self._right # or, self._right.__contains__(item)
While this code structure closely matches the empty-check for the general Tree
class,
we can also combine the two levels of nested ifs to get a slightly more concise version:
def __contains__(self, item: Any) -> bool:
"""Return whether <item> is in this BST.
"""
if self.is_empty():
return False
elif item == self._root:
return True
elif item < self._root:
return item in self._left # or, self._left.__contains__(item)
else:
return item in self._right # or, self._right.__contains__(item)
Did you notice something different about this method
in comparison to all the Tree
methods we’ve written?
It uses linear, not branching recursion.
Think about why this is possible
and what it means for the efficiency of our code.
Can you imagine any BinarySearchTree
methods that cannot be written with linear recursion?