Next, we’re going to learn about a new data structure called the Binary Search Tree (or BST). Binary search trees make certain operations fast, and are the basis of advanced data structures you’ll learn about in CSC263 that are even more efficient.
Our goal this week is to take trees and use them to implement the Multiset ADT, also referred to as the Collection ADT which supports the following behaviours:
Notice that this ADT offers a bit more flexibility than the Container-based ADTs such as Stack and Queue that we have seen previously in the course, as it allows the user to choose which item to remove, rather than using a fixed order of removal.
Because removing an item requires searching the collection to make sure that the item is present, this ADT also supports __contains__
, which searches within the collection by value, rather than by position. It is this “search” behaviour that we will consider first.
To search a list, the obvious iterative algorithm (for both Python lists and linked lists) is to loop through all items in the list and stop when the item is found. When the item is not in the list, all items must be checked, making this a * linear time* operation: the time taken for an unsuccessful search grows proportionally with the length of the list.
It turns out that the Tree.__contains__
method has the same behaviour: if the item is not in the tree, every item in the tree must be checked. In the code, the recursive case must loop through every subtree and make a recursive call. So just switching from lists to trees isn’t enough to do better!
However, one of the great insights in computer science is that adding some additional structure to the input data can enable new, more efficient algorithms. You have seen a simple form of this called augmentation in previous labs, but we’ll look at more complex “structures” imposed on data here.
In the case of Python lists, if we assume that the list is sorted, then we can use the binary search algorithm to greatly improve the efficiency of search. If you need a refresher on binary search, please check out the “binary search” videos from this CSC108 playlist.
But because this is still based on built-in array-based lists, we suffer the same drawbacks for insertion and deletion we encountered previously in Section 3.4. So the question is: can we achieve efficient search, insertion, and deletion all at once? Yes we can!
To do this, we will combine the branching structure of trees with the idea of binary search to develop a notion of a “sorted tree”, which we will call a Binary Search Tree (BST).
A binary tree is a tree in which every item has at most two subtrees. An item in a binary tree satisfies the binary search tree property if its value is greater than or equal to all items in its left subtree, and less than or equal to all items in its right subtree. Note that duplicates of the root are allowed in either subtree in this version.
A binary tree is a binary search tree if every item in the tree satisfies the binary search tree property (the “every” is important: in general, it’s possible that some items satisfy this property but others don’t).
Binary search trees naturally represent sorted data. That is, even if the data doesn’t arrive for insertion in sorted order, the BST keeps track of it in a sorted fashion. This makes BSTs extremely efficient in doing operations like searching for an item; but unlike sorted Python lists, they can be much more efficient at insertion and deletion while maintaining the sortedness of the data!