Before we dive into the CSC148 material proper, we’ll review a few fundamental concepts from CSC108. We start with one of the most important ones: how the Python programming language represents data.
All data in a Python program is stored in objects that have three components: id, type, and value. We normally think about the value when we talk about data, but the data’s type and id are also important.
The id of an object is a unique identifier, meaning that no other object has the same identifier. Often Python uses the memory address of the object as its id, but it doesn’t have to; it just has to guarantee uniqueness. We can see the id of any object by calling the id
function:
We can see the type of any object by calling the type
function:
An object’s type determines what functions can operate on it. For example, we can call the function round
on numeric types (such as int
and float
), but not on strings:
>>> round(2)
2
>>> round(3.1419)
3
>>> round('hello!')
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: type str doesn't define __round__ method
Types also determine the objects on which we can use built-in Python operators.We’ll see later in the course that most of Python’s operators are actually implemented using functions. For example, the +
operator works on two integers, and even on two strings, but is not defined for adding an integer and a string together:
>>> 3 + 4
7
>>> 'hey' + 'hello'
'heyhello'
>>> 3 + 'hello'
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> 'hello' + 'goodbye'
'hellogoodbye'
Finally, you are already familiar with accessing the value of an object, which we call evaluating the object. For example, this is what happens when we type an object into the Python terminal:
All programming languages have the concept of variables. In Python, a variable is not an object, and so does not actually store data; it stores an id that refers to an object that stores data. This is the case whether the data is something very simple like an int
or more complex like a str
.
Consider this code:
>>> x = 3
>>> x
3
>>> type(x)
<class 'int'>
>>> id(x)
1635361280
>>> word = 'bonjour'
>>> type(word)
<class 'str'>
>>> id(word)
4385008808
The state of memory after the above piece of code executes is this:
We write the id and type of each object in its upper-left corner and upper-right corner, respectively. The actual object id reported by the id
function has many digits, and its true value isn’t important; we just need to know that each object has a unique identifier. So for our drawings we make up short identifiers such as id92
.
Notice that there is no 3
inside the box for variable x
. Instead, there is the id of an object whose value is 3. We say that x
refers to this object, or that x
references this object. The same holds for variable word
; it references an object whose value is 'bonjour'
.
Here are a couple of other things to notice:
str
type, we know nothing about what data members it uses to store its contents. So we just write the value 'bonjour'
inside the box. This is a perfectly fine abstraction.We can reassign a variable so that it refers to a new value. For instance:
In this example, there is only one variable named cat
. At first it contains the id of a str
object containing the string Gilbert
, and then the id in it changes to be that of a str
object containing the string Chairman Meow
.
These examples are extremely simple, but having an accurate image will be necessary in order to avoid bugs in the much more complex code that we will write this term.
We saw above that Python will report to us what type(word)
is. But it is really reporting the type of the object that word
refers to. The variable word
itself has no type. This is different from many other languages, such as Java and C, where every variable has a type. In fact, Python doesn’t mind if we make word
refer now to a different type of object, although this is almost surely a bad idea.
You’ve written code much more complex than what’s above, but may not have had to think in detail about all the small steps that Python has to undertake to execute even a simple assignment statement. These details are foundational for writing and debugging the more complex code you will work on in csc148. So let’s pause for a moment and be explicit about two things.
This is what Python does when an assignment statement is executed:
An assignment statement always has an expression on the right-hand side. Expressions can occur in other places also, for instance as arguments to a function call. When an expression is encountered, it must be evaluated. This always yields a value, which is the id of an object.
This is what Python does when an expression is evaluated:
176.4
or 'hello'
, create an object of the appropriate type to hold it. The value of the expression is the id of that object.+
or %
, evaluate its two operands, apply the operator to them, and create a new object of the appropriate type to hold the result. The value of the expression is the id of that object.There are additional rules for other types of expression, but these will do for now.
Some data types in Python (e.g., integers, strings, and booleans) are immutable, meaning that the value stored in an object of that type cannot change.
For example, suppose we have the following code:
>>> prof = 'Diane'
>>> id(prof)
4405312456
>>> prof = prof + ' Horton'
>>> prof
'Diane Horton'
>>> # The old str object couldn't change, so Python made a new
>>> # str object for the variable prof to refer to. Since it's
>>> # a new object, it has a different id.
>>> id(prof)
4405308016
We did not change the value stored in the object—we couldn’t, since strings are immutable—but rather changed what prof
refers to, as shown here:
We will use the convention of drawing a double box around objects that are immutable. Think of it as signifying that you can’t get in there and change anything.
Notice that in the example above we reassigned the variable prof
—that is, we made it refer to a new str
object— and we could do this even though strings are immutable. Regardless of the mutability of any objects, we can always reassign a variable.
More complex data structures in Python are mutable, including lists, dictionaries, and user-defined classes. Let’s see what this means with a list:
Below, we perform two mutating operations on x
, and check that its id hasn’t changed. Note that even changing the list’s size doesn’t change its id!
>>> x[0] = 1000000
>>> x
[1000000, 2, 3]
>>> id(x)
50706312
>>> x.extend([10, 20, 30])
>>> x
[1000000, 2, 3, 10, 20, 30]
>>> id(x)
50706312
Here’s what’s going on in memory:
The lines x[0] = 1000000
and x.extend([10, 20, 30])
changed the value of the list object that x
refers to. We say that these lines mutate the object that x
refers to. (They also cause the creation of four new objects of type int
.)
When two variables refer to the same object, we say that the variables are aliases of each other.My dictionary says that the word “alias” is used when a person is also known under a different name. For example, we might say “Eric Blair, alias George Orwell.” We have two names for the same thing, in this case a person.
Consider the following Python code:
x
and z
are aliases, as they both reference the same object. As a result, they have the same id. You should think of the assignment statement z = x
as saying “make z
refer to the object that x
refers to.” After doing so, they have the same id.
In contrast, x
and y
are not aliases. They each refer to a list object with [1, 2, 3]
as its value, but they are two different list objects, stored separately in your computer’s memory. This is again reflected in their different ids.
Here is the state of memory after the code executes:
Aliasing is often a source of confusion for beginners, because it allows “action at a distance”: the modification of a variable’s value without explicitly mentioning that variable. Here’s an example:
The third line mutates the value of z
. But without ever mentioning x
, it also mutates the value of x
! We call this a side effect.
Imprecise language can lead us into misunderstanding the code. We said above that “the third line mutates the value of z
”. To be more precise, the third line mutates the object that z
refers to. Of course we can also say that it mutates the object that x
refers to—they are the same object! A clear diagram like this can really help:
The key thing to notice about this example is that just by looking at the third line of code, z[0] = -999
, you can’t tell that x
has changed; you need to know that on a previous line, z
was made an alias of x
. This is why you have to be careful when aliasing occurs.
Contrast the previous code with this:
Can you predict the value of x
on the last line? Here, the third line mutates the object that y
refers to, but because it is not the same object that x
refers to, we still see [1, 2, 3]
if we evaluate x
. Here’s the state of memory after these lines execute:
Aliasing also exists for immutable data types, but in this case there is never any “action at a distance”, precisely because immutable values can never change. For example, a tuple is an ordered sequence like a list, but it is immutable. In the example below, x
and z
are aliases of a tuple object; but it is impossible to create a side effect on x
by mutating the object that z
refers to, since we can’t mutate tuples at all.
>>> x = (1, 2, 3)
>>> z = x
>>> z[0] = -999
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
What if we did this instead?
Again, we have made x
and z
refer to the same object. So when we change z
on the third line, does x
also change? This time, the answer is an emphatic no, and it is because of the kind of change we make on the third line. Instead of mutating the object that z
refers to, we make z
refer to a new object. This obviously can have no effect on the object that x
refers to (or any object). Even if we switched the example from using immutable tuples to using mutable lists, x
would be unchanged.
In general, a statement of the form my_var = _____
never mutates the object that my_var
refers to; all it can ever do is set my_var
to refer to a different object. Keep this rule in mind when you’re writing your own code, as it’s often easy to confuse mutating values with changing references.
Sometimes it makes sense to make a copy of a data structure so that changes can be made to it without any side effect on the original. Keep in mind, though, that this consumes both space and time resources, and is often unnecessary.
Let’s look one more time at this code:
>>> x = [1, 2, 3]
>>> y = [1, 2, 3]
>>> z = x
>>> id(x)
4401298824
>>> id(y)
4404546056
>>> id(z)
4401298824
What if we wanted to see whether x
and y
, for instance, were the same? Well, we’d need to define precisely what we mean by “the same.” We can use the ==
operator to compare the values stored in the objects they reference. This is called value equality.
Or, we can use the is
operator to compare the ids of the objects they reference. With is
, we are asking whether two variables reference the exact same object. This is called identity equality.
All built-in types have an implementation for ==
so that we can check for value equality; we’ll later see how to define ==
for our own classes.
Because ints are immutable, there isn’t much point in Python creating a separate int object every time your variable needs to refer to, say, 0. They can all refer to the very same object and no harm can be done since the object can never change. This explains the following code:
>>> x = 43
>>> y = 43
>>> z = x
>>> # Of course we see that all three variables have value
>>> # equality. They all reference an int object containing
>>> # 43. Whether or not they are the same int object is
>>> # irrelevant to "==".
>>> x == y
True
>>> x == z
True
>>> # But "is" checks identity equality. We wouldn't have
>>> # expected x and y to reference the same int object.
>>> # But now we know that Python feels free to take a
>>> # short-cut and not create a second int object holding
>>> # the value 43, and in this case it did:
>>> x is y
True
>>> x is z
True
>>> # We can confirm that x and y have the same id:
>>> id(x)
4331557184
>>> id(y)
4331557184
Python can take this short-cut with any value of any immutable type. For example, here we can observe the short-cut with strings:
But in this example, Python doesn’t take the short-cut:
It turns out that when Python does and doesn’t take the short-cut is quite complex, and it could even change from one version of Python to the next. But it makes no difference to our code’s behaviour; the only reason we need to be aware of it is so that we are not surprised when we see that two variables unexpectedly have identity equality.