4.1 Introduction to Abstract Data Types#

In the first few weeks of the course, we have mainly played the role of the class designer and implementer. However, you have actually spent most of your programming career in the opposite role: whenever you use one of Python’s built-in functions or data structures, you only worry about what it does, not how it works. In other words, you are a client of the built-in Python libraries and classes.

Some concepts are so general and so useful across many problems that they transcend any specific programming language. An abstract data type (or ADT) defines some kind of data and the operations that can be performed on it. It is a pure interface, with no mention of an implementation—that’s what makes it abstract.

In contrast to this, a data structure is a concrete strategy for storing some data. For example, one data structure we could use to store the grades of a class on a series of course activities is a list of lists:

grades = [['Sadia', 78, 82],
          ['Yuan', 75, 64],
          ['Elise', 80, 71]]
# To know what the "columns" are for,
# we could use another list:
items = ['A1', 'midterm']

An alternative data structure is a dictionary of dictionaries:

g2 = {'A1': {'Sadia': 78, 'Yuan': 75, 'Elise': 80},
      'midterm': {'Sadia': 82, 'Yuan': 64, 'Elise': 71}}

You can very likely think of other options, and may find it interesting to consider pros and cons of the two data structures above. But the point here is that ADTs are fundamentally concerned with the what: what data is stored, and what we can do with this data. Data structures are concerned with the how: how is that data stored, and how do we actually implement our desired methods to operate on this data? This distinction is crucial to the kind of abstract thinking you’ll develop as programmers: by separating the what from the how, you’ll gain substantial benefits, as we are about to learn.

Some famous abstract data types#

In this section, we’ll briefly describe some of the most common abstract data types in computer science. Now, while computer scientists generally agree on what the “main” abstract data types are, they often disagree on what operations each one actually supports. You’ll notice here that we’ve taken a fairly conservative approach for specifying operations, limiting ourselves to the most basic ones.

  • Set

    • Data: a collection of unique elements

    • Operations: get size, insert a value (without introducing duplicates), remove a specified value, check membership

  • Multiset

    • Data: a collection of elements (possibly with duplicates)

    • Operations: same as Set, but the insert operation allows duplicates

  • List

    • Data: an ordered sequence of elements

    • Operations: access element by index, insert a value at a given index, remove a value at a given index

  • Map

    • Data: a collection of key-value pairs, where each key is unique and associated with a single value

    • Operations: lookup a value for a given key, insert a new key-value pair, remove a key-value pair, update the value associated with a given key

  • Iterable

    • Data: a collection of values (may or may not be unique)

    • Operations: iterate through the elements of the collection one at a time.

Many of these will sound familiar, as you’ll have used them in your past programming experience, even if you haven’t heard the term “abstract data type” before! For example, you have probably used both a Python list and dict. However, there is a really important distinction between Python’s built-in classes and the ADTs we’ve listed above: list and dict are data structures, not abstract data types. That is, they’re concrete implementations, and every necessary decision about how these classes store their data and implement their methods has been made. Indeed, the designers of the Python programming language put a great deal of effort into these implementations so that list and dict operations are extremely fast.

So a dict, for instance, is not itself an ADT. But it is fair to say that a dict is a natural implementation of the Map ADT. However, there is NOT a one-to-one correspondence between ADTs and data structures, in Python or any other language.

A single ADT can be implemented by many different data structures. For example, although the Python list is a natural implementation of the List ADT, we could implement it instead with a dict in which each key is the index of its item. A list of 3 elements, hello, 42, and goodbye in positions 0, 1, and 2 respectively, would be

{0: 'hello', 1: 42, 2: 'goodbye'}

On the flip side, each data structure can be used to implement multiple ADTs. The Python list can be used to implement not just the List ADT, but each of the other above ADTs as well. For instance, think about how you would implement the Set ADT with a list, and in particular, how you would avoid duplicates.[1] A dict could also implement any of the ADTs above, and the same is true of the new data structures you will learn in this course.

The value of knowing all the standard ADTs#

We’ve said that each ADT is so general that it transcends any individual problem or program or even programming language. And in fact the ADTs given above are implemented in other programming languages (to varying degrees). While the exact data structures used to implement them vary significantly from language to language, these ADTs concepts form a common vocabulary whose understanding is necessary to being a professional computer scientist.