Graph Theory

Contents:

basic definitions
graph isomorphisms
graph representation in a computer program
graph traversal -- readings section 8.2
path-finding -- readings section 8.3

Basic definitions

A graph is an ordered pair G=(V,E) where V is a set of vertices and E is a set of edges.

Vertices are just things, any set element.

Edges go between vertices, so we can represent an edge as a pair of vertices.

We also speak of

V(G): the set of vertices of G

and

E(G): the set of edges of G

The "order" of a graph is the number of vertices (nodes).
The "size" of a graph is the number of edges.

Often people talk about "directed graphs", in which the edges have an orientation, in which case an edge is an ordered pair. In this course we'll mostly talk about "undirected graphs", in which an edge is a pair of vertices without regard to order, hence a set of two vertices.

If there is an edge between two vertices, we say that they are "adjacent".

The "degree" of a vertex is the number of vertices adjacent to it.

A "path" is any list of vertices such that every pair of vertices next to each other in the path are adjacent in the graph.

We say that a graph is "connected" if there exists some path between every pair of vertices in the graph.

A graph is "complete" if all pairs of vertices are adjacent. You fill it in completely, all nodes connected to all nodes.

Since all complete graphs of a given order are isomorphic (see below), if we're only discussing graphs up to isomorphism we can speak of the complete graph of a given order, which we write as K_p, for a given p. E.g. K₂ is the graph with two vertices which are connected to each other, K₃ is the graph most obviously drawn as a triangle, etc.

A "colouring" is an assignment of colours (or any other property you care to think of instead) to vertices such that every vertex is assigned a colour, and adjacent vertices always have different colours.

If a graph can be coloured with k colours, we say that it is "k-colourable". Here "k" is a number, e.g. a graph might be "3-colourable".

Obviously if it is k-colourable, it's k+1-colourable, k+2, etc. You don't have to use all the colours. (And you could usually just change any one vertex to the new colour, if you had to use all the colours.)

The "chromatic number" of a graph is the minimum number of colours with which you can colour that graph. We use a lower-case chi for this, chi .

A graph with k vertices can be coloured with k colours. But often it can be coloured with fewer. That is to say, chi (G) <= |V(G)|.

A complete graph needs all p colours. That is, chi (K_p) = p.

A general format for a proof that the chromatic number of a given graph is some particular number k is:

prove that G is not (k-1)-colourable
prove G is k-colourable

Another way to say this is:

prove (G)>=k
prove (G)<=k

Graph isomorphisms

Two graphs are equal if their sets of vertices and edges are equal (respectively). That is, the set membership is identical. This is not a very useful concept and you'll rarely see people talk about it.

The following two graphs:
Graph G:

x ---- y

Graph H:

a ---- b

are not equal:
V(G) = { x, y } whereas V(H) = { a, b };
and E(G) = { { x, y } } whereas E(H) = { { a, b } }.

But this seems like a silly observation. They're the same! Obviously!

To capture this concept of being the same, we use the idea of an isomorphism. Two graphs are "isomorphic" if there exists an isomorphism between them. An isomorphism is a bijective mapping between the vertices of the two graphs such that vertices p and q in graph G are adjacent if and only if vertices f(p) and f(q) in graph H are adjacent.

Most commonly we speak of graphs only "up to isomorphism"; that is, considering isomorphic graphs to be the same. Thus we can talk of "the" complete graph of order 5, even though there are any number of such graphs with the trivial difference that their vertices have different names. Up to isomorphism, there is only one complete graph of order 5.

There is almost always more than one isomorphism, if there are any. The graphs are isomorphic if there exists an isomorphism. Or several possible isomorphisms. If there exists one isomorphism, or more, then the graphs are isomorphic.

Graph representation in a computer program

There are two basic ways to represent graphs in a computer program.

1. with pointers, like we do for trees or linked lists.

Each vertex is a data structure containing pointers (references) to other vertices.

2. with an "adjacency matrix"

An adjacency matrix is a two-dimensional boolean array whose dimension is the number of vertices in the graph. Each vertex thus has to have an index, from 0 to |V(G)|-1. The entry adjacency_matrix[i][j] indicates whether there is an edge (i,j).

Note that both of these representations are inherently about directed graphs. If you wish to represent an undirected graph, either you have to store every edge twice, or when you want to check whether there is an edge between two vertices you have to check for both directions. Normally we choose to store every edge twice, especially in the adjacency-matrix case where it doesn't take any more memory.

Which representation should you use? An adjacency matrix is often easier to deal with, but wastes much storage for sparse graphs. It might not be feasible to use it for a graph with, say, ten thousand nodes and only about twenty thousand edges, because the adjacency matrix would have a hundred million entries in it (ten thousand squared), which would be about 400 MB of memory. On the other hand, using structs and pointers, those ten thousand nodes and twenty thousand edges would be using about thirty thousand words, or about 120K, an eighth of a megabyte. These days, an eighth of a megabyte is an amount of memory we can easily afford to expend on a significant computing task, and 400 MB (in memory not on disk, this is) is something which is usually simply not available.

So we tend to prefer an adjacency matrix for dense graphs, in which a large fraction of the possible edges exist, but to prefer the pointer representation for a sparse graph.

Directed graphs

In a "directed graph", an edge is an ordered pair, not a set. Whereas {a,b} = {b,a}, (a,b) != (b,a) if a!=b.

As noted above, either of our common graph representations is oriented towards directed graphs; there's nothing new to say to explain how to represent a directed graph.

An example of a directed graph is a binary search tree. The arrows representing edges have a definite direction. The relationship between parent node and child node is asymmetric.

Whereas we talk about the "degree" of a node in an undirected graph, we can talk about the "in-degree" and the "out-degree" of a directed graph. In general, they are independent.

Graph traversal

readings, section 8.2

void traverse(struct subtree *p)
{
    for all i in {nodes which are children of p}
        traverse(i)
}

To traverse the whole tree, pass in the root.

This is the basic way we traverse a binary search tree.

If you look at the order in which nodes are visited, you see that we go as deeply into one subtree as we can, then come back. We call this a "depth-first search".

Consider an alternative, called "breadth-first search", in which we visit all nodes at a given level, then all of their immediate children, and so on.

Example of doing a breadth-first search in a chess-playing program
[picture goes here]

Suppose that this node here [point] represents the end of the game and we've won! We could just take this branch. No need to play this [other branch] losing game to the bitter end. We're not even going to go down there at all.

More generally, a breadth-first search is good for a certain large set of planning tasks. Suppose I'm considering the sequence in which I can move forward, backward, left, and right to try to get out of this room at the end of the hour when it's crowded and everyone's milling about and I have to rush off somewhere. If I can consider all paths of length 1, then all paths of length 2, and so on, then I'll find the shortest path out of the room first. This is a breadth-first search.

How do we do a breadth-first search?

Actually, how do we do a depth-first search? The above algorithm is specific to directed trees. Suppose we have a graph like this:

[draw K₄] Starting from here [circle a node], we want to do, say, a depth-first search of this graph. In the tree case, we didn't have to worry about coming back around [illustrate].

In fact, in this graph, or in most graphs which aren't trees, applying that tree-specific DFS algorithm would be an infinite loop. We'd just keep going back and forth between the first two nodes.

Depth-first search algorithm

With a stack 's' and a boolean array 'visited'

s := empty stack
for u in vertices
    visited[u] := false
push the starting vertex onto the stack s

while (s is not empty) {
    pop one vertex into variable u
    if (not visited[u]) {
        visited[u] := true
        for i in the list of vertices adjacent to u
            if (not visited[i])
                push i onto stack s
    }
}

Note: if the graph is not fully connected, we have to run this for each connected subgraph.

The breadth-first search algorithm is the same, except use a queue instead of a stack!

Path-finding

readings, section 8.3

Remember a "path": a list of vertices such that each adjacent pair in the list is adjacent on the graph. If the vertices of the graph are cities and the edges are roads connecting the cities, then a path is, well, a path between the cities. A voyage you can take.

A circuit (or "cycle") is a path whose initial and final vertices are the same.
(Some people assign slightly different meanings to the terms "circuit" and "cycle", but it seems that each author uses the two words to draw a different distinction. Also, some people treat the terms as synonymous. The readings selection makes a distinction based on non-repetition of edges versus non-repetition of vertices.)
(Similarly, we need to add something to this definition to rule out a "cycle" or "circuit" such as (x, y, x) for vertices x and y which are adjacent in an undirected graph. Again, different authors do this differently, including that some apply the term "cycle" only to directed graphs.)

A graph containing no circuits of length > 1 is "acyclic". (no cycles)
An acyclic connected graph is a tree.

map of distances between cities. Find shortest distance between 'x' and 'y' in km.

This data is like an adjacency matrix, except there is data associated with each edge, not just the boolean fact as to whether the edge exists.

In a "weighted graph", each edge has an associated "weight" (or "cost"), which is a real number. A "shortest path" between two vertices is one with minimum total weight of edges traversed.

Dijkstra's algorithm

for all v in V
    d[v] := 
d[starting vertex] := 0;

DONE := empty set
REMAINING := V

while (REMAINING is not empty) {
    x := the vertex in REMAINING such that d[x] is minimum
    REMAINING := REMAINING - {x}
    DONE := DONE  {x}
    for all edges (x,y)  E {
        if (y  DONE
                && d[y] > d[x] + weight(x,y)) {
            d[y] := d[x] + weight(x,y);
            predecessor[y] := x;
        }
    }
}

Floyd-Warshall algorithm

The "Floyd-Warshall algorithm" makes the entire grid. Actually, this turns out to be slightly easier to code than Dijkstra's algorithm.

Note that all of these algorithms are really based on directed graphs. If you want to do them for undirected graphs, either double the number of edges, or always check twice.

where n = |V|

for (i = 0; i < n; i++)
    for (j = 0; j < n; j++)
        if (i == j)
            dist[i][j] = 0;   /* (two-dimensional array) */
        else if (i,j)  E
            dist[i][j] = weight(i, j);
        else
            dist[i][j] = ;
for (k = 0; k < n; k++)
    for (i = 0; i < n; i++)
        for (j = 0; j < n; j++)
            if (dist[i][k] + dist[k][j] < dist[i][j])
                /* counting infinity as greater than all numbers, of course */
                dist[i][j] = dist[i][k] + dist[k][j];

k is the new vertex whose impact on all distances so far is being implemented.

dist is an upper bound of the true minimum distance -- easily verified in initialization; note that assignments preserve it; the only question is whether this has gone long enough to find all actual distances.

The answer is yes; provable in graph theory.

[list of topics covered, evening section]
[course notes available (evening section)]
[main course page]