So far, we have defined database schemas and queries mathematically. We are very well prepared to learn SQL a programming language that allows us to execute our queries in a DBMS.
SQL is short for “Structured Query Language”, but it is for more than just writing queries. It has two sub-parts:
We’ll be using an open-source DBMS called PostgreSQL. You’ll find documentation for postgreSQL at the postgreSQL home page including information on how to download it for your own platform, should you choose to.
These specific pages are highly relevant:
Learning PostgreSQL will prepare you well for using other DMBSs, such as MySQL and Oracle. Unfortunately, there are differences in the SQL language across DBMSs. There is an official SQL standard that fully specifies the language, and is updated periodically. However, DBMSs typically don’t implement every feature in the standard, and often add their own features as well. This means that you may find differences around the edges of the language when you work with a new DBMS. It also creates issues for portability of code.
Before we begin, here is the university schema we have been working with (the key of each relation is in bold):
We’ll learn later how to express this in SQL using the Data Definition Language. But first, let’s start writing some queries!
The simplest kind of query is a select-from-where. Here is an example:
csc343h-prof=> SELECT name -- choose the column called "name"
csc343h-prof-> FROM Course -- from the Course table
csc343h-prof-> WHERE dept = 'CSC'; -- choose only rows that satisfy
name
---------------------------
Intro to Databases
Software Design
Intro to Comp Sci
Data Struct & Anal
Intro to Visual Computing
COBOL programming
(6 rows)
WHERE
is equivalent to σ in relational algebra. Sadly, the designers of SQL chose the keyword SELECT
for the relational algrebra Π but you will get used to that very quickly.
Below are a few additional things we can do with a select-from-where, to get us started.
We can get the Cartesian product of two or more tables by puttting them in a comma-separated list in the FROM clause:
csc343h-prof=> SELECT name, sid, grade
csc343h-prof-> FROM Course, Offering, Took
csc343h-prof-> WHERE Course.dept = 'CSC';
name | sid | grade
---------------------------+-------+-------
Intro to Databases | 99132 | 79
Intro to Databases | 99132 | 79
Intro to Databases | 99132 | 79
Intro to Databases | 99132 | 79
... etc.
Of course, this will create many nonsensical combinations. We will soon learn all about how to express natural joins and other kinds of join in SQL. In the meanwhile, we can use WHERE
to filter out those nonsensical combinations:
csc343h-prof=> SELECT name, sid, grade
csc343h-prof-> FROM Course, Offering, Took
csc343h-prof-> WHERE Course.dept = Offering.dept and -- New!
csc343h-prof-> Course.cnum = Offering.cnum and -- New!
csc343h-prof-> Offering.oid = Took.oid and -- New!
csc343h-prof-> Course.dept = 'CSC';
name | sid | grade
--------------------+-------+-------
Intro to Databases | 99132 | 79
Intro to Databases | 98000 | 82
Software Design | 98000 | 89
Software Design | 98000 | 72
... etc.
We just saw our first compound condition. We can build boolean expressions using:
=, <, >, <=, >=, !=
and <>
. The operators !=
and <>
mean “not equals” in SQL.AND
, OR
, and NOT
Similar to ρ in relational algebra, in SQL we can rename a table for the duration of a query. Here we rename Employee
to e
and Deparment
to d
:
csc343h-prof=> SELECT e.name, d.name
csc343h-prof-> FROM Employee e, Department d
csc343h-prof-> WHERE d.name = 'marketing' and e.name = 'Horton';
In this case, it makes the query a bit more concise than it would have been:
csc343h-prof=> SELECT Employee.name, Department.name
csc343h-prof-> FROM Employee, Department
csc343h-prof-> WHERE Department.name = 'marketing' and Employee.name = 'Horton';
but we also know that renaming can be convenient for other reasons, such as distinguishing two occurences of a table in a self-join:
csc343h-prof=> SELECT e1.name, e2.name
csc343h-prof-> FROM Employee e1, Employee e2
csc343h-prof-> where e1.salary < e2.salary;
If we want every column to be included in the result, rather than listing them all, we can use a wildcard:
csc343h-prof=> SELECT *
csc343h-prof-> FROM Course
csc343h-prof-> WHERE dept = 'CSC';
cnum | name | dept | breadth
------+---------------------------+------+---------
343 | Intro to Databases | CSC | f
207 | Software Design | CSC | f
148 | Intro to Comp Sci | CSC | f
263 | Data Struct & Anal | CSC | f
320 | Intro to Visual Computing | CSC | f
222 | COBOL programming | CSC | f
(6 rows)
We can add an AS-expression to choose a column name in the result of a query.
csc343h-prof=> SELECT name AS title, dept
csc343h-prof-> FROM Course
csc343h-prof-> WHERE breadth;
title | dept
-------------------+------
Intro Archaeology | ANT
Narrative | ENG
Rhetoric | ENG
The Graphic Novel | ENG
Mediaeval Society | HIS
Black Freedom | HIS
(6 rows)
In relational algebra, a relation is a set and so order doesn’t matter. But in the real world, order often does matter. We can sort the results of a query by adding an ORDER BY
clause to the end of a select-from-where query:
csc343h-prof=> SELECT sid, grade
csc343h-prof-> FROM Took
csc343h-prof-> WHERE grade > 90
csc343h-prof-> ORDER BY grade;
sid | grade
-------+-------
157 | 91
99999 | 91
98000 | 92
98000 | 93
99999 | 94
99999 | 96
98000 | 97
157 | 98
98000 | 98
99132 | 98
99132 | 99
99999 | 99
99999 | 99
157 | 99
99999 | 100
(15 rows)
The default is to use ascending order, but we can override this by saying DESC
(not DESCENDING
):
csc343h-prof=> SELECT sid, grade
csc343h-prof-> FROM Took
csc343h-prof-> WHERE grade > 90
csc343h-prof-> ORDER BY grade DESC;
sid | grade
-------+-------
99999 | 100
157 | 99
99999 | 99
99132 | 99
99999 | 99
157 | 98
98000 | 98
99132 | 98
98000 | 97
99999 | 96
99999 | 94
98000 | 93
98000 | 92
157 | 91
99999 | 91
(15 rows)
We can order according to multiple columns. Here, we see courses in order by dept
, and where there are multiple rows with the same value for dept
, they are further ordered by course name
.
csc343h-prof=> SELECT *
csc343h-prof-> FROM Course
csc343h-prof-> ORDER BY dept, name;
cnum | name | dept | breadth
------+---------------------------+------+---------
203 | Human Biol & Evol | ANT | f
200 | Intro Archaeology | ANT | t
222 | COBOL programming | CSC | f
263 | Data Struct & Anal | CSC | f
148 | Intro to Comp Sci | CSC | f
343 | Intro to Databases | CSC | f
320 | Intro to Visual Computing | CSC | f
207 | Software Design | CSC | f
263 | Compar Vert Anatomy | EEB | f
216 | Marine Mammal Bio | EEB | f
150 | Organisms in Environ | EEB | f
110 | Narrative | ENG | t
205 | Rhetoric | ENG | t
235 | The Graphic Novel | ENG | t
200 | Environmental Change | ENV | f
320 | Natl & Intl Env Policy | ENV | f
296 | Black Freedom | HIS | t
220 | Mediaeval Society | HIS | t
(18 rows)
We can also use the value of an expression to determing ordering. For example, if we had a table including columns called sales
and rentals
, we could write ORDER BY sales + rentals
.
Instead of a simple attribute name, you can use an expression in a SELECT clause. Here are two simple examples:
csc343h-prof=> SELECT sid, grade + 10 AS adjusted
csc343h-prof-> FROM Took;
sid | adjusted
-------+----------
99132 | 89
99132 | 108
99132 | 92
99132 | 109
... etc.
csc343h-prof=> SELECT dept || cnum
csc343h-prof-> FROM Course
csc343h-prof-> WHERE cnum < 200;
?column?
----------
CSC148
EEB150
ENG110
CSC100
(4 rows)
In the second query, ||
is string concatenation. (See the PostgreSQL documentation for details on the many built-in functions and operators.) Notice that we did not provide a name for the column in the result. In such cases, SQL names it `?column?’. It’s essentially saying “I have no idea what to call this!”
Rather than pull values from a table, a SELECT clause can use a constant value in a column. Here is an example:
csc343h-prof=> SELECT name, 'satisfies' as breadthRequirement
csc343h-prof-> FROM Course
csc343h-prof-> WHERE breadth;
name | breadthrequirement
-------------------+--------------------
Intro Archaeology | satisfies
Narrative | satisfies
Rhetoric | satisfies
The Graphic Novel | satisfies
Mediaeval Society | satisfies
Black Freedom | satisfies
(6 rows)
We have extracted only courses for which the value of breadth
is true (which means, in our schema, that the course satisfies the “breadth requirement” of a university degree). And we chose to show this in the result by having the value satisfies
throughout the second column, which we named breadthrequirement
.
SQL provides the LIKE
operator for comparing a string to a pattern. The pattern is a quoted string and can include these special characters:
_
matches any single character%
matches any sequence of zero or more charactersHere are two simple examples:
csc343h-prof=> SELECT *
csc343h-prof-> FROM Course
csc343h-prof-> WHERE name LIKE '%to%';
cnum | name | dept | breadth
------+---------------------------+------+---------
343 | Intro to Databases | CSC | f
148 | Intro to Comp Sci | CSC | f
320 | Intro to Visual Computing | CSC | f
263 | Compar Vert Anatomy | EEB | f
205 | Rhetoric | ENG | t
(5 rows)
csc343h-prof=> -- Notice that UTSC students are excluded. The string 'UTSC'
csc343h-prof=> -- does not match the pattern 'UT_' because the underscore only
csc343h-prof=> -- matches a single character, and the pattern is required to
csc343h-prof=> -- cover the entire string.
csc343h-prof=> SELECT *
csc343h-prof-> FROM Student
csc343h-prof-> WHERE campus LIKE 'UT_';
sid | firstname | surname | campus | email | cgpa
-----+-----------+-----------+--------+---------+------
157 | Leilani | Lakemeyer | UTM | lani@cs | 3.42
(1 row)
LIKE
offers an extremely limited form of pattern matching but has the advantage of being fast. SQL also supports regular expressions for string matching, using the ~
operator. For example:
csc343h-prof-> -- Find students whose surname begins with M or F or L, followed
csc343h-prof-> -- by 'a', and then zero or more additional characters.
csc343h-dianeh=> SELECT *
FROM Student
WHERE surname ~ '(M|F|L)a*';
sid | firstname | surname | campus | email | cgpa
-------+-----------+------------+--------+----------+------
99132 | Avery | Marchmount | StG | avery@cs | 3.13
98000 | William | Fairgrieve | StG | will@cs | 4.00
157 | Leilani | Lakemeyer | UTM | lani@cs | 3.42
(3 rows)
For full details, see the postgreSQL documentation Note this warning contained in that documentation: “Regular expressions can be contrived that take arbitrary amounts of time and memory to process. Be wary of accepting regular-expression search patterns from hostile sources.”
Let’s address the lexical level of SQL syntax: how its smallest elements (keywords, operators, and so on) are recognized.
Keywords in SQL are not case-sensitive. Neither are identifiers (names of tables or columns). In addition, line breaks and tabs are ignored by SQL. So, for example, all of these queries are syntactically valid and do the same thing:
csc343h-prof=> -- This looks nice.
csc343h-prof=> SELECT sid
csc343h-prof-> FROM took
csc343h-prof-> WHERE grade > 50;
csc343h-prof=> -- This is fine if we just want the answer and aren't saving the query.
csc343h-prof=> select sid from took where grade > 50;
csc343h-prof=> -- This is just ugly.
csc343h-prof=> SELECT SID FROM
csc343h-prof-> TOOK WHERE GRADE
csc343h-prof-> > 50
csc343h-prof-> ;
There is no standard convention, but a reasonable one (used in the first query above) is this:
Typing keywords in all capitals is a little awkward, and you may feel that the queries are shouting at you. It is acceptable in this course to not capitalize keywords. Whatever you do, do it consistently.
We say that SQL is a high-level language. An SQL query says what we want from the database, not how to get it. We never have to think about what is stored in memory vs in some level of secondary storage, or what points to what. In other words, we have a level of abstraction that makes it much easier to write queries. In addition, the DBMS can change how the data is stored with no impact on our queries. We say that we have “physical data independence”.