CSC111 Tutorial 9: Iterative Sorting Algorithms

In this week’s lecture, you learned about selection sort and insertion sort, two iterative sorting algorithms. Today, you’ll get some more practice with these sorting algorithms, some running-time analysis, and develop a simple sorting visualization tool using pygame.

Part 1: Variations on selection sort and insertion sort

When you first learn a new algorithm like selection sort or insertion sort, it can be tempting to just memorize the code for that algorithm. However, mindlessly memorizing code isn’t actually a good strategy for gaining deep understanding of what that algorithm is doing, or how to implement it. So in the first part of this tutorial, we’ll use a different strategy to reinforce your learning of these two sorting algorithms: implementing variations of them.

Download tutorial9_part1.py and open it. Inside, you’ll see our implementations of selection sort and insertion sort from lecture, as well as a few new functions to implement. Each of these functions can be implemented using a very similar approach as one of our two sorting algorithms, but it’ll be up to you to figure out what changes you need to make.

As you complete each function, we recommend comparing your implementation against the code from lecture, so that you can see both the similarities and differences in the code.

Part 2: Running-time analysis for binary search

In the prep reading for this week, you learned about the binary search algorithm, which can search for an item in a list faster than the standard linear search algorithm, but only works on sorted list. Here’s the implementation from Section 18.1:

def binary_search(lst: list, item: Any) -> bool:
    """Return whether item is in lst using the binary search algorithm.

    Preconditions:
        - is_sorted(lst)
    """
    b = 0
    e = len(lst)

    while b < e:
        # Loop invariants
        assert all(lst[i] < item for i in range(0, b))
        assert all(lst[i] > item for i in range(e, len(lst)))

        m = (b + e) // 2
        if item == lst[m]:
            return True
        elif item < lst[m]:
            e = m
        else:  # item > lst[m]
            b = m + 1

    # If the loop ends without finding the item, the item is not in the list.
    return False

The course notes briefly discussed the running time of binary search, but only at a high level. Intuitively, at each loop iteration the size of the sublist being searched decreases by a factor of 2, and so there can be at most (approximately) \(\log_2 n\) iterations in total. But how do we formalize this intuition?

We run into trouble because we don’t have a predictable formula the values of variables b and e after \(k\) iterations, since how the search range changes depends on the contents of lst and the item being searched for.

We can reconcile the intuition with our more formal approach by explicitly introducing and analysing the behaviour of a new variable. Specifically: we let \(r = e - b\) be a variable representing the size of the search range (\(r\) stands for “range”).

Using this definition, answer the questions below. By the end of this part, you’ll find a good upper bound on the worst-case running time of binary_search.

Let \(n\) represent the length of the input list lst. What is the initial value of \(r\), in terms of \(n\)?
For what value(s) of \(r\) will the loop terminate?
Prove that at each loop iteration, if item is not found then the value of \(r\) decreases by at least a factor of 2. More precisely, let \(r_k\) and \(r_{k+1}\) be the values of \(r\) immediately before and after the \(k\)-th iteration, respectively, and prove that \(r_{k+1} \leq \frac{1}{2}r_k\).

Hints: do a proof by cases. Use the property that for all \(x \in \R\), \(x - 1 < \floor{x} \leq x\).
Find an exact upper bound number of iterations that could occur (in terms of \(n\)), and use this to show that the worst-case running time of binary_search is \(\cO(\log n)\).

Part 3: Visualizing sorting algorithms

In the final part of today’s tutorial, you’ll develop a cool way of visualizing our two different sorting algorithms using pygame.¹

Selection sort example

To visualize a sorting algorithm being performed on a list of length \(n\), the pygame window is divided into an \(n\)-by-\(n\) grid, where:

Each row of the grid represents the state of the list at a given loop iteration in the sorting algorithm.
- Each row is divided into \(n\) grid cells, where each cell represents one element of the list. For now, we’ll only consider lists of numbers between 0–255, and visualize each number \(n\) as the grayscale colour \((n, n, n)\).²
For selection sort, the \(n\) rows show the state of the list at the start of each of the \(n\) loop iterations. But for insertion sort, the \(n\) rows show the state of the list at the end of each loop iteration.³

To start, please download the starter file tutorial9_part3.py. We’ve included the sorting algorithms from lecture (the versions including the key parameter), and then the tutorial exercises divided into “grayscale” and “colour” parts.

Under “Tutorial exercises (grayscale verion)”, implement the function draw_list, which is responsible for drawing a single row of the grid. You can test your work in our starter implementation of visualize_selection_sort, which draws just the top row of the grid. Try creating a random list of 20 numbers between 0–255, and calling visualize_selection_sort on this list. You should see an image similar to the one pictured above, except only the top row is displayed (and the rest of the window is white).
Implement the visualization functions visualize_selection_sort and visualize_insertion_sort. Your implementations can copy-and-paste from our algorithms, but should also call your draw_list helper in the appropriate place in the loop body. See visualize_selection_sort’s docstring for some implementation notes that apply to both of these functions.

After you’re done, try calling each function, choosing a random list of size 10, 100, and 200. You should see each sorting algorithm visualized in a top-down manner. Make sure you can identify which part of the visualization is the “sorted” vs. “unsorted” part of each list, and how the two algorithms differ in what makes up the sorted part.
Now repeat Questions 1 and 2 for the “Tutorial exercises (colour version)” section. For those functions, the input list being sorted is now a list of tuples storing the RGB values of colours. Unfortunately, just sorting these RGB tuples directly isn’t very visually pleasing, and so we’ve provided a helper function rgb_to_hsv that converts each tuple to a different colour representation known as Hue-Saturation-Value (HSV).

So in the visualization functions, you’ll need to run the sorting algorithms on this list of tuples, but also use this helper function as the “sort key”, using the generalized sorting technique we introduced in lecture.

To test your work, try generating some random lists of colours and running your visualization algorithms on them. (Remember, each random colour should be a random tuple with 3 elements in the range 0–255.) You should see a beautiful 🌈 rainbow 🌈 of colours in the “sorted part” of each algorithm!

This tutorial exercises is inspired by https://corte.si/posts/code/visualisingsorting/.↩︎
You might recall grayscale colours from our first tutorial all the way back in CSC110!↩︎
The reason for this is a bit subtle. Normally, we would show both the initial state of the list and the state of the list after each of the \(n\) loop iterations. But this results in \(n + 1\) rows total, which breaks the symmetry of our visualization. But for insertion sort, the initial list state and the state after the first loop iteration are identical, and so we can skip drawing the initial state. For selection sort, the states of the list at the beginning and end of the last iteration are equal, and so we can skip drawing the very last state. ↩︎