In the first few weeks of the course, we have mainly played the role of the class designer and implementer. However, you have actually spent most of your programming career in the opposite role: whenever you use one of Python’s built-in functions or data structures, you only worry about what it does, not how it works. In other words, you are a client of the built-in Python libraries and classes.
Some concepts are so general and so useful across many problems that they transcend any specific programming language. An abstract data type (or ADT) defines some kind of data and the operations that can be performed on it. It is a pure interface, with no mention of an implementation—that’s what makes it abstract. In contrast to this, a data structure is a concrete strategy for storing some data. For example, one data structure we could use to store the grades of a class on a series of course activities is a list of lists:
grades = [['Sadia', 78, 82],
['Yuan', 75, 64],
['Elise', 80, 71]]
# To know what the "columns" are for,
# we could use another list:
items = ['A1', 'midterm']
An alternative data structure is a dictionary of dictionaries:
g2 = {'A1': {'Sadia': 78, 'Yuan': 75, 'Elise': 80},
'midterm': {'Sadia': 82, 'Yuan': 64, 'Elise': 71}}
You can very likely think of other options, and may find it interesting to consider pros and cons of the two data structures above. But the point here is that ADTs are fundamentally concerned with the what: what data is stored, and what we can do with this data. Data structures are concerned with the how: how is that data stored, and how do we actually implement our desired methods to operate on this data? This distinction is crucial to the kind of abstract thinking you’ll develop as programmers: by separating the what from the how, you’ll gain substantial benefits, as we are about to learn.
In this section, we’ll briefly describe some of the most common abstract data types in computer science. Now, while computer scientists generally agree on what the “main” abstract data types are, they often disagree on what operations each one actually supports. You’ll notice here that we’ve taken a fairly conservative approach for specifying operations, limiting ourselves to the most basic ones.
Set
Multiset
List
Map
Iterable
Many of these will sound familiar, as you’ll have used them in your past programming experience, even if you haven’t heard the term “abstract data type” before! For example, you have probably used both a Python list
and dict
. However, there is a really important distinction between Python’s built-in classes and the ADTs we’ve listed above: list
and dict
are data structures, not abstract data types. That is, they’re concrete implementations, and every necessary decision about how these classes store their data and implement their methods has been made. Indeed, the designers of the Python programming language put a great deal of effort into these implementations so that list
and dict
operations are extremely fast.
So a dict
, for instance, is not itself an ADT. But it is fair to say that a dict
is a natural implementation of the Map ADT. However, there is NOT a one-to-one correspondence between ADTs and data structures, in Python or any other language.
A single ADT can be implemented by many different data structures. For example, although the Python list
is a natural implementation of the List ADT, we could implement it instead with a dict
in which each key is the index of its item. A list of 3 elements, hello
, 42, and goodbye
in positions 0, 1, and 2 respectively, would be
On the flip side, each data structure can be used to implement multiple ADTs. The Python list
can be used to implement not just the List ADT, but each of the other above ADTs as well. For instance, think about how you would implement the Set ADT with a list
, and in particular, how you would avoid duplicates. Beginning Python programmers often implement the Set ADT with a list
, but Python has a built-in set
class that implements the Set ADT, does all the work of duplicate-avoidance for you, and does it very efficiently. A dict
could also implement any of the ADTs above, and the same is true of the new data structures you will learn in this course.
We’ve said that each ADT is so general that it transcends any individual problem or program or even programming language. And in fact the ADTs given above are implemented in other programming languages (to varying degrees). While the exact data structures used to implement them vary significantly from language to language, these ADTs concepts form a common vocabulary whose understanding is necessary to being a professional computer scientist.