Vertices are just things, any set element.
Edges go between vertices, so we can represent an edge as a pair of vertices.
We also speak of
The "order" of a graph is the number of vertices (nodes).
The "size" of a graph is the number of edges.
Often people talk about "directed graphs", in which the edges have an orientation, in which case an edge is an ordered pair. In this course we'll mostly talk about "undirected graphs", in which an edge is a pair of vertices without regard to order, hence a set of two vertices.
If there is an edge between two vertices, we say that they are "adjacent".
The "degree" of a vertex is the number of vertices adjacent to it.
A "path" is any list of vertices such that every pair of vertices next to each other in the path are adjacent in the graph.
We say that a graph is "connected" if there exists some path between every pair of vertices in the graph.
A graph is "complete" if all pairs of vertices are adjacent. You fill it in completely, all nodes connected to all nodes.
Since all complete graphs of a given order are isomorphic (see below), if we're only discussing graphs up to isomorphism we can speak of the complete graph of a given order, which we write as Kp, for a given p. E.g. K2 is the graph with two vertices which are connected to each other, K3 is the graph most obviously drawn as a triangle, etc.
A "colouring" is an assignment of colours (or any other property you care to think of instead) to vertices such that every vertex is assigned a colour, and adjacent vertices always have different colours.
If a graph can be coloured with k colours, we say that it is "k-colourable". Here "k" is a number, e.g. a graph might be "3-colourable".
Obviously if it is k-colourable, it's k+1-colourable, k+2, etc. You don't have to use all the colours. (And you could usually just change any one vertex to the new colour, if you had to use all the colours.)
The "chromatic number" of a graph is the minimum number of colours with which you can colour that graph. We use a lower-case chi for this, .
A graph with k vertices can be coloured with k colours. But often it can be coloured with fewer. That is to say, (G) <= |V(G)|.
A complete graph needs all p colours. That is, (Kp) = p.
A general format for a proof that the chromatic number of a given graph is some particular number k is:
Two graphs are equal if their sets of vertices and edges are equal (respectively). That is, the set membership is identical. This is not a very useful concept and you'll rarely see people talk about it.
The following two graphs:
Graph G:
x ---- yGraph H:
a ---- bare not equal:
But this seems like a silly observation. They're the same! Obviously!
To capture this concept of being the same, we use the idea of an isomorphism. Two graphs are "isomorphic" if there exists an isomorphism between them. An isomorphism is a bijective mapping between the vertices of the two graphs such that vertices p and q in graph G are adjacent if and only if vertices f(p) and f(q) in graph H are adjacent.
Most commonly we speak of graphs only "up to isomorphism"; that is, considering isomorphic graphs to be the same. Thus we can talk of "the" complete graph of order 5, even though there are any number of such graphs with the trivial difference that their vertices have different names. Up to isomorphism, there is only one complete graph of order 5.
There is almost always more than one isomorphism, if there are any.
The graphs are isomorphic if there exists an isomorphism.
Or several possible isomorphisms.
If there exists one isomorphism, or more, then the graphs are isomorphic.
Note that both of these representations are inherently about directed graphs.
If you wish to represent an undirected graph, either you have to store every
edge twice, or when you want to check whether there is an edge between two
vertices you have to check for both directions.
Normally we choose to store every edge twice, especially in the
adjacency-matrix case where it doesn't take any more memory.
Which representation should you use?
An adjacency matrix is often easier to deal with, but wastes much storage for
sparse graphs.
It might not be feasible to use it for a graph with, say, ten
thousand nodes and only about twenty thousand edges, because the adjacency
matrix would have a hundred million entries in it (ten thousand
squared), which would be about 400 MB of memory.
On the other hand, using structs and pointers, those ten thousand nodes and
twenty thousand edges would be using about thirty thousand words, or about
120K, an eighth of a megabyte. These days, an eighth of a megabyte is an
amount of memory we can easily afford to expend on a significant computing
task, and 400 MB (in memory not
on disk, this is) is something which is usually simply not available.
So we tend to prefer an adjacency matrix for dense graphs, in which a large
fraction of the possible edges exist, but to prefer the pointer representation
for a sparse graph.
As noted above, either of our common graph representations is oriented towards
directed graphs; there's nothing new to say to explain how to represent a
directed graph.
An example of a directed graph is a binary search tree.
The arrows representing edges have a definite direction.
The relationship between parent node and child node is asymmetric.
Whereas we talk about the "degree" of a node in an undirected graph,
we can talk about the "in-degree" and the "out-degree" of a directed graph.
In general, they are independent.
This is the basic way we traverse a binary search tree.
If you look at the order in which nodes are visited, you see that
we go as deeply into one subtree as we can, then come back.
We call this a "depth-first search".
Consider an alternative, called
"breadth-first search",
in which we visit all nodes at a given level, then all of their immediate
children, and so on.
Example of doing
a breadth-first search in a chess-playing program
Suppose that this node here [point] represents the end of the
game and we've won! We could just take this branch. No need to play this
[other branch] losing game to the bitter end. We're not even going to go
down there at all.
More generally, a breadth-first search is good for a certain large set of
planning tasks. Suppose I'm considering the sequence in which I can move
forward, backward, left, and right to try to get out of this room at the
end of the hour when it's crowded and everyone's milling about and I have to
rush off somewhere.
If I can consider all
paths of length 1, then all paths of length 2, and so on, then I'll find
the shortest path out of the room first. This is a breadth-first search.
How do we do a breadth-first search?
Actually, how do we do a depth-first search? The above algorithm is specific
to directed trees. Suppose we have a graph like this:
[draw K4]
Starting from here [circle a node], we want to do, say, a depth-first search
of this graph. In the tree case, we didn't have to worry about coming back
around [illustrate].
In fact, in this graph, or in most graphs which aren't trees, applying
that tree-specific DFS algorithm would be an infinite loop. We'd just keep
going back and forth between the first two nodes.
Note: if the graph is not fully connected, we have to run this for each
connected subgraph.
The breadth-first search algorithm is the same, except use a queue instead
of a stack!
Remember a "path": a list of vertices such that each adjacent pair in the
list is adjacent on the graph. If the vertices of the graph are cities
and the edges are roads connecting the cities, then a path is, well, a path
between the cities.
A voyage you can take.
A circuit (or "cycle") is a path whose initial and final vertices are the
same.
A graph containing no circuits of length > 1 is "acyclic". (no cycles)
map of distances between cities.
Find shortest distance between 'x' and 'y' in km.
This data is like an adjacency matrix, except there is data associated with
each edge, not just the boolean fact as to whether the edge exists.
In a "weighted graph", each edge has an associated "weight" (or "cost"),
which is a real number. A "shortest path" between two vertices is one with
minimum total weight of edges traversed.
Note that all of these algorithms are really based on directed graphs.
If you want to do them for undirected graphs, either double the number of
edges, or always check twice.
where n = |V|
dist is an upper bound of the true minimum distance -- easily verified in
initialization; note that assignments preserve it; the only question is
whether this has gone long enough to find all actual distances.
The answer is yes; provable in graph theory.
Graph representation in a computer program
There are two basic ways to represent graphs in a computer program.
1. with pointers, like we do for trees or linked lists.
Each vertex is a data structure containing pointers (references) to other
vertices.
2. with an "adjacency matrix"
An adjacency matrix is a two-dimensional boolean array whose dimension
is the number of vertices in the graph.
Each vertex thus has to have an index, from 0 to |V(G)|-1.
The entry adjacency_matrix[i][j] indicates whether there is an edge (i,j).
Directed graphs
In a "directed graph", an edge is an ordered pair, not a set.
Whereas {a,b} = {b,a},
(a,b) != (b,a) if a!=b.
Graph traversal
readings, section 8.2
void traverse(struct subtree *p)
{
for all i in {nodes which are children of p}
traverse(i)
}
To traverse the whole tree, pass in the root.
[picture goes here]
Depth-first search algorithm
With a stack 's' and a boolean array 'visited'
s := empty stack
for u in vertices
visited[u] := false
push the starting vertex onto the stack s
while (s is not empty) {
pop one vertex into variable u
if (not visited[u]) {
visited[u] := true
for i in the list of vertices adjacent to u
if (not visited[i])
push i onto stack s
}
}
Path-finding
readings, section 8.3
(Some people assign slightly different meanings to the terms "circuit" and
"cycle", but it seems that each author uses the two words to draw a
different distinction.
Also, some people treat the terms as
synonymous. The readings selection makes a distinction based on
non-repetition of edges versus non-repetition of vertices.)
(Similarly, we need to add something to this definition to rule out a
"cycle" or "circuit" such as (x, y, x) for vertices x and y which
are adjacent in an undirected graph. Again, different authors do this
differently, including that some apply the term "cycle" only to directed
graphs.)
An acyclic connected graph is a tree.
Dijkstra's algorithm
for all v in V
d[v] :=
d[starting vertex] := 0;
DONE := empty set
REMAINING := V
while (REMAINING is not empty) {
x := the vertex in REMAINING such that d[x] is minimum
REMAINING := REMAINING - {x}
DONE := DONE {x}
for all edges (x,y) E {
if (y DONE
&& d[y] > d[x] + weight(x,y)) {
d[y] := d[x] + weight(x,y);
predecessor[y] := x;
}
}
}
Floyd-Warshall algorithm
The "Floyd-Warshall algorithm"
makes the entire grid.
Actually, this turns out to be slightly easier to code than Dijkstra's algorithm.
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
if (i == j)
dist[i][j] = 0; /* (two-dimensional array) */
else if (i,j) E
dist[i][j] = weight(i, j);
else
dist[i][j] = ;
for (k = 0; k < n; k++)
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
if (dist[i][k] + dist[k][j] < dist[i][j])
/* counting infinity as greater than all numbers, of course */
dist[i][j] = dist[i][k] + dist[k][j];
k is the new vertex whose impact on all distances so far is being implemented.
[list of topics covered, evening section]
[course notes available (evening section)]
[main course page]