So far in this course, we’ve used the term data type to
actually mean two different things. Most of the time, we use it to mean
a data type in the Python programming language, like int
or
list
or a data class we’ve defined. When we use the term
“data type” in this way, it is synonymous with the term Python
class, which is the name the Python language gives to all of its
data types. We’ll now refer to these Python classes as concrete
data types, since they have a concrete implementation in Python
code. This is true for both built-in data types, data classes that we
define, and the more general classes we learned about in Section 10.2.
However, there’s another way we’ve used the term “data type” that
goes all the way back to Chapter 1: as abstract representations of data
that transcend any one specific programming language. For example, the
Python list
class is implemented differently than the Java
ArrayList
or JavaScript Array
, but all three
share some common expectations of what list operations they support. We
can describe these common, language-independent list operations by
defining an abstract data type (ADT), which defines an
entity that stores some kind of data and the operations that can be
performed on it. Using the terminology from Section 10.1, an abstract data type is a
pure interface that is concerned only with the what—what data
is stored, what we can do with this data—and not the how—how a
computer actually stores this data or implements these operations.
Let’s take a moment here to review some of the collection-based abstract data types we’ve seen already in this course. One caveat with this list: while computer scientists generally agree on what the “main” abstract data types are, they often disagree on what operations each one actually supports. We’ve taken a fairly conservative approach for specifying operations, limiting ourselves to the most basic ones.
Set
List
Mapping
Iterable
There are a few more foundational abstract data types in computer science that we’ll cover in this chapter, and in future courses. We have discussed many of these throughout the semester so far, and have used many in Python. But the true power of ADTs is that they are abstract enough to transcend any individual program or even programming languages. ADTs like lists, sets, and mappings form a common vocabulary that is necessary to being a professional computer scientist.
Abstract data types form a high-level interface between a computer scientist and how the computer stores program data. A concrete data type is an implementation of an abstract data type: unlike abstract data types, they are actually concerned with how the data is stored and how their operations are implemented. The creators of the Python programming language took various abstract data types and created a set of built-in concrete data types (classes), making careful decisions about how each class would store its data and implement its methods. Indeed, as Python programmers we benefit from all the work they’ve put in to create classes that not just support common ADTs, but to make their implementations extremely fast using more complex ideas in algorithm design and implementation. You’ll learn about some of these techniques in CSC263/265!
So a dict
, for instance, is not itself an abstract data
type. But the dict
data type is an obvious implementation
of the Mapping ADT. However, there is NOT a one-to-one
correspondence between abstract data types and concrete data types,
in Python or any other programming language. A single abstract data type
can be implemented by many different concrete data types. For example,
although the Python dict
is a natural implementation of the
Mapping ADT, we could implement the Mapping ADT instead with a
list
, where each element is a tuple storing a key-value
pair:
# A Mapping using a Python dict
0: 'hello', 1: 42, 2: 'goodbye'}
{
# A Mapping using a Python list
0, 'hello'), (1, 42), (2, 'goodbye')] [(
Conversely, every concrete data type can be used to implement
multiple ADTs. The Python list
can be used to implement not
just the List ADT, but each of the other above ADTs as well. For
instance, think about how you would implement the Set ADT with a
list
, and in particular, how you would avoid
duplicates. Though just because something is possible doesn’t mean
it is a good idea in practice. Beginning Python programmers often
implement use a list
when all they need is the Set ADT’s
operations. As we discussed in 9.7 Analyzing Built-In
Data Type Operations, this leads to slower programs, and so should
be avoided. A dict
could also implement any of the
ADTs above, and the same is true of the new data structures you will
learn in this course.
You might be wondering what is the point of making this
distinction—so what if list
s can implement the Mapping ADT,
we’d never use this in “real” Python code when we have a
dict
instead. And that’s true! But what this distinction
reminds us is that we always have choices when implementing an
interface. Rather than saying “it’s not possible to implement a Mapping
using list
”, we instead say “it is possible to implement a
Mapping using list
, but this choice is worse than using
dict
”.
Any idea why is a dict
better than list
at
implementing the Mapping ADT? If we ignore the fact that we’ve been
using dict
for this purpose all along, the answer is not
obvious! It comes down to efficiency: though dict
and list
can both be used to implement the Mapping ADT, the
implementation of dict
makes the Mapping operations much
faster than how we would (straightforwardly) implement the Mapping ADT
using a list
. As we’ll see a few times this chapter,
running time analysis is one of the key ways to evaluate and compare
different implementations of an ADT.