Now that we have covered the basics of a Python representation of graphs, we are ready to perform some interesting graph computations! First, recall this graph method from 17.3 Representing Graphs in Python:
class Graph:
def adjacent(self, item1: Any, item2: Any) -> bool:
"""Return whether item1 and item2 are adjacent vertices in this graph.
Return False if item1 or item2 do not appear as vertices in this graph.
"""
if item1 in self._vertices and item2 in self._vertices:
= self._vertices[item1]
v1 return any(v2.item == item2 for v2 in v1.neighbours)
else:
# We didn't find an existing vertex for both items.
return False
Our goal in this section will be to implement a generalization of this method that goes from checking whether two vertices are adjacent to whether two vertices are connected: Recall that two vertices are connected when there exists a path between them.
class Graph:
def connected(self, item1: Any, item2: Any) -> bool:
"""Return whether item1 and item2 are connected vertices in this graph.
Return False if item1 or item2 do not appear as vertices in this graph.
>>> g = Graph()
>>> g.add_vertex(1)
>>> g.add_vertex(2)
>>> g.add_vertex(3)
>>> g.add_vertex(4)
>>> g.add_edge(1, 2)
>>> g.add_edge(2, 3)
>>> g.connected(1, 3)
True
>>> g.connected(1, 4)
False
"""
We can start implementing this method in the same fashion as
Graph.adjacent
, by finding the vertex corresponding to
item1
:
class Graph:
def connected(self, item1: Any, item2: Any) -> bool:
"""..."""
if item1 in self._vertices and item2 in self._vertices:
= self._vertices[item1]
v1
...else:
return False
The problem, of course, is in the final else branch. We can’t simply
check whether item2
is an immediate neighbour of
item1
, since there are many other possible cases for two
vertices being connected: item2
could equal
item1
itself, or it could be a neighbour, or the neighbour
of a neighbour, or the neighbour of a neighbour of a neighbour, etc.
So you might be tempted to start writing code that tries to cover all these cases:
class Graph:
def connected(self, item1: Any, item2: Any) -> bool:
"""..."""
if item1 in self._vertices and item2 in self._vertices:
= self._vertices[item1]
v1 if v1.item == item2:
return True
elif any(u.item == item2 for u in v1.neighbours):
return True
elif any(any(u1.item == item2 for u1 in u.neighbours)
for u in v1.neighbours):
return True
...else:
return False
You can see why this gets unmanageable very quickly, and fact cannot possibly cover all possibilities! Because we need to handle an arbitrary number of “neighbour” links, we can’t write a fixed amount of code that enumerates the possibilities; instead, we either need to use a loop or recursion. We’ll develop a recursive approach in this section, although it is possible to translate this approach into one that uses loops as well.
Here is our “complete” implementation of
Graph.connected
. As we did in earlier chapters, we’re going
to move the trickiest part (filling in that else branch) into a helper
method.
class Graph:
def connected(self, item1: Any, item2: Any) -> bool:
"""..."""
if item1 in self._vertices and item2 in self._vertices:
= self._vertices[item1]
v1 return v1.check_connected(item2)
else:
return False
Note that our helper method, check_connected
is called
with vertex
to the left of the dot—that means that it’s a
method of the _Vertex
class, not Graph
. Here
is its initial specification:
class _Vertex:
def check_connected(self, target_item: Any) -> bool:
"""Return whether this vertex is connected to a vertex corresponding to the target_item.
"""
Now let’s recall our definition of connectedness: two vertices are connected when there exists a path between them. The challenge with this definition is that it only requires that a path exists, but we don’t know how long it is. But there’s another, equivalent way of defining connectedness recursively.
Definition (connectivity, recursive).
Let \(G = (V, E)\) be a graph, and let \(v_1, v_2 \in V\). We say that \(v_1\) and \(v_2\) are connected when:
This definition maps onto the recursive definition structure we’ve seen so far, with the first point being the base case and the second being the recursive part. However, this definition is different from all the previous recursive definitions in one important respect: it isn’t exactly structural, since it doesn’t break down the data type—a graph—into a smaller instance with the same structure. We’ll come back to this point further down below.
But first, well, let’s take this recursive definition and run with
it! Even though we’re still new to graphs, everything we’ve learned
about translating recursive definitions into Python still applies, and
our code actually turns out to be surprisingly
simple: This version uses a loop, but you could certainly
write a version with the built-in any
function as
well.
class _Vertex:
def check_connected(self, target_item: Any) -> bool:
"""Return whether this vertex is connected to a vertex corresponding to the target_item.
"""
if self.item == target_item:
# Our base case: the target_item is the current vertex
return True
else:
for u in self.neighbours:
if u.check_connected(target_item):
return True
return False
Pretty simple! But unfortunately, our implementation has a flaw that leads to a new kind of error:
>>> g = Graph()
>>> g.add_vertex(1)
>>> g.add_vertex(2)
>>> g.add_vertex(3)
>>> g.add_edge(1, 2)
>>> g.connected(1, 3) # Should return False!
Traceback (most recent call last):"<input>", line 1, in <module>
File "path/to/graph.py", line 163, in connected
File return vertex.check_connected(item2)
"path/to/graph.py", line 41, in check_connected
File if u.check_connected(target_item):
"path/to/graph.py", line 41, in check_connected
File if u.check_connected(target_item):
"path/to/graph.py", line 41, in check_connected
File if u.check_connected(target_item):
985 more times]
[Previous line repeated "path/to/graph.py", line 36, in check_connected
File if self.item == target_item:
RecursionError: maximum recursion depth exceeded in comparison
What’s going on?
RecursionError
and the danger of infinite recursionOur implementation for _Vertex.check_connected
is
simple, and exactly follows the recursive definition of connectedness we
gave above. So what we’re running into here is not an error in our
logic, but a limitation of how we’ve expressed that logic in our code.
To understand what’s gone wrong, let’s do a bit of manual tracing of our
example above.
First, we have a graph g
with three vertices, containing
the items 1
, 2
, and 3
. In our
initial call to g.connected(1, 3)
, we first find the vertex
corresponding to 1
, which we’ll name \(v_1\). Then our code calls
_Vertex.check_connected
on \(v_1\) and 3
. This is where the
problem begins.
self.item == 1
, which is not 3
. We enter the
else branch, where the loop (for u in self.neighbours
)
executes. The only neighbour is the vertex containing 2
,
which we’ll name \(v_2\). We make a
recursive call to _Vertex.check_connected
on \(v_2\) and 3
.self.item == 2
, which is not 3
. We enter the
else branch, where the loop (for u in self.neighbours
)
executes. The only neighbour is \(v_1\). We make a recursive call to
_Vertex.check_connected
on \(v_1\) and 3
.self.item == 1
, which is not 3
. We enter the
else branch…Hey, wait a second! We’re back to the first step, repeating
a call to _Vertex.check_connected
with arguments \(v_1\) and 3
. That’s not good,
because we know what happens in this case: we recurse on \(v_2\).
This is an instance of infinite recursion, meaning
we’ve expressed a recursive computation that does not stop by reaching a
base case. In our example, the Python interpreter alternates between
calling _Vertex.check_connected
with \(v_1\) and 3
and with \(v_2\) and 3
, and never stops.
From a technical point of view, each of these function calls adds a new
stack frame onto the function call stack, which uses up more
and more computer memory. To prevent your computer from running out of
memory, the Python interpreter enforces a limit on the total number of
stack frames that can be on the call stack at any one point in time—and
when that limit is reached, the Python interpreter raises a
RecursionError
.
_Vertex.check_connected
Okay, so now that we understand what the problem is, how do we fix it? Intuitively, we need to prevent repeated recursive calls to the same node. Before changing our code, let’s first revisit our recursive definition of connectedness.
Given two vertices \(v_1\) and \(v_2\), they are connected when:
- \(v_1 = v_2\), or
- there exists a neighbour \(u\) of \(v_1\) such that \(u\) and \(v_2\) are connected.
Our problem is in the recursive step. When we say that “\(u\) and \(v_2\) are connected”, we’re allowing for the possibility that these two vertices are connected by a path going through \(v_1\). But that’s not necessary: if \(v_1\) and \(v_2\) are connected, then we should be able to find a path between a neighbour \(u\) and \(v_2\) that doesn’t use \(v_1\), and then add \(v_1\) to the start of that path.
So we can modify this definition as follows.
Definition (connectivity, recursive, modified).
Let \(G = (V, E)\) be a graph, and let \(v_1, v_2 \in V\). We say that \(v_1\) and \(v_2\) are connected when:
One way to visualize is that in the recursive step, we should be able to remove \(v_1\) from the graph and still find a path between \(u\) and \(v_2\). And this is how we can take our original recursive definition and impose some structure on it: we take the original graph and make it smaller by “removing” the vertex \(v_1\), and then recursively checking for the connectivity of each neighbour \(u\) and \(v_2\).
visited
parameterNow, let’s take this idea and apply it to fix our code. It might be
tempting to actually mutate our Graph
object to remove the
“\(v_1\)” vertex at each recursive
call, but we don’t actually want to mutate the original graph when we
call Graph.connected
. So instead, we’ll use a common
technique when traversing graphs: keep track of the items that have been
already visited by our algorithm, so that we don’t visit the same vertex
more than once.
Here is our new _Vertex.check_connected
specification:
class _Vertex:
def check_connected(self, target_item: Any, visited: set[_Vertex]) -> bool:
"""Return whether this vertex is connected to a vertex corresponding to the target_item,
WITHOUT using any of the vertices in visited.
Preconditions:
- self not in visited
"""
And before getting to our updated implementation, let’s update how we
call this helper method in Graph.connected
. When we make
the initial call to _Vertex.check_connected
, we haven’t yet
visited any vertices:
class Graph:
def connected(self, item1: Any, item2: Any) -> bool:
"""..."""
if item1 in self._vertices and item2 in self._vertices:
= self._vertices[item1]
v1 return v1.check_connected(item2, set()) # Pass in an empty "visited" set
else:
return False
And finally, let’s modify our implementation of
_Vertex.check_connected
to use this new parameter. We need
to make two changes: first, we add self
to
visited
before making any recursive calls (to indicate that
the current _Vertex
has been visited by our algorithm);
second, when looping over self.neighbours
, we only make
recursive calls into nodes that have not yet been visited.
class _Vertex:
def check_connected(self, target_item: Any, visited: set[_Vertex]) -> bool:
"""Return whether this vertex is connected to a vertex corresponding to the target_item,
WITHOUT using any of the vertices in visited.
Preconditions:
- self not in visited
"""
if self.item == target_item:
# Our base case: the target_item is the current vertex
return True
else:
= visited.union({self}) # Add self to the set of visited vertices
new_visited for u in self.neighbours:
if u not in new_visited: # Only recurse on vertices that haven't been visited
if u.check_connected(target_item, new_visited):
return True
return False
And with this version, we’ve eliminated our infinite recursion error:
>>> g = Graph()
>>> g.add_vertex(1)
>>> g.add_vertex(2)
>>> g.add_edge(1, 2)
>>> g.connected(1, 3) # Should return False!
False
One last thing: you might have noticed that in our final
implementation of _Vertex.check_connected
, we created a new
set of visited vertices (new_visited
) rather than mutating
visited
, which would have been simpler and more efficient.
Consider this alternate version:
class _Vertex:
def check_connected(self, target_item: Any, visited: set[_Vertex]) -> bool:
"""Return whether this vertex is connected to a vertex corresponding to the target_item,
WITHOUT using any of the vertices in visited.
Preconditions:
- self not in visited
"""
if self.item == target_item:
# Our base case: the target_item is the current vertex
return True
else:
self) # Add self to the set of visited vertices
visited.add(for u in self.neighbours:
if u not in visited: # Only recurse on vertices that haven't been visited
if u.check_connected(target_item, visited):
return True
return False
This call uses our familiar set.add
method, which is
indeed constant time. However, there is an important difference with
this version: now there is only one visited
set object that
gets shared and mutated across all recursive calls, including ones that
return False
.
So for example, suppose self
has two neighbours
u0
and u1
, and we first recursive on
u0
. If u0
is not connected to the target
vertex, then the check_connected
call will return
False
, as expected. But u0
would still be in
visited
after that recursive call ends, meaning the
subsequent check_connected
call with u1
will
never visit u0
. This is somewhat surprising, but
technically still correct: once we’ve established that u0
is not connected to the target item, we shouldn’t recurse on it
ever again.
But even though this implementation is correct, and much more efficient than the previous one, it does come with an important warning we want you to remember for the future. Whenever you use recursion with a mutable argument, be very careful when choosing whether to mutate that argument or create a modified copy—if you choose to mutate the argument, know that all recursive calls will mutate it as well!