So far we’ve seen how the data types we introduced in Chapter 1 can be used to store a variety of different data. In our modern world, data is constantly being created, stored, sent, and received. But not all data is created equal; some data is inherently more sensitive than other data. And there are laws mandating the privacy of your data in Canada. Thanks to the explosion of data and the evolution of privacy policy, there are numerous technologies (backed by strong theoretical underpinnings) being developed to ensure data privacy and security.
After our work from the previous chapter, we now have the theoretical foundations necessary to learn about one of the coolest applications of number theory in computer science: encrypting messages so that only the sender and receiver can understand them. Check out the movie The Imitation Game, which is about some amazing code-breaking work done in World War II (and a crucial piece in the history of computing). This is only one method for ensuring data privacy, but it is pervasive—nearly every time you send or receive something on your phone or web browser, cryptography plays a role. In this section, you’ll learn about the basics of cryptography, which is the study of theoretical and practical techniques for keeping data secure.
Cryptography is the study of techniques used to keep communication secure in the face of adversaries who wish to eavesdrop on or interfere with the communication. Defining what secure communication between two parties means is complex, and involves several dimensions such as: confidentiality, data integrity, and authentication. In this chapter we will focus primarily on encryption, which involves turning coherent messages into seemingly-random nonsensical strings, and then back again.
As computers have become more powerful, cryptographic technologies have evolved to ensure that the “nonsense” strings are not easily converted back to the coherent message except by the intended recipient(s). But the growing power of computers is a double-edged sword; while cryptographic technologies have evolved, so have the technologies of malicious attackers and eavesdroppers who want to decipher the “nonsense” strings and gain access to sensitive data, such as passwords and social insurance numbers.
The simplest setup that we study in cryptography is two-party confidential communication. In this setup, we have two people, who we’ll call Alice and Bob, who wish to send messages to each other that only they can read, and a third person, Eve, who has access to all of the communications between Alice and Bob, and wants to discover what they’re saying.
Since Eve has access to the communications between Alice and Bob, they can’t just send their messages directly. So instead, Alice and Bob need to encrypt their messages using some sort of encryption algorithm, and send the encrypted versions to each other instead. The hope is that through some shared piece of information called a secret key, Alice and Bob can encrypt their messages in such a way that they will each be able to decrypt each other’s messages, but Eve won’t be able to decrypt the messages without knowing their secret key.
More formally, we define a secure symmetric-key cryptosystem as a system with the following parts:
A set \(\mathcal{P}\) of possible original messages, called plaintext messages. (E.g., a set of strings)
A set \(\mathcal{C}\) of possible encrypted messages, called ciphertext messages. (E.g., another set of strings)
A set \(\mathcal{K}\) of possible shared secret keys (known by both Alice and Bob, but no one else).
Two functions \(Encrypt : \mathcal{K} \times \mathcal{P} \to \mathcal{C}\) and \(Decrypt : \mathcal{K} \times \mathcal{C} \to \mathcal{P}\) that satisfies the following two properties:
One of the earliest examples we have of a symmetric-key cryptosystem is the Caesar cipher, named after the Roman general Julius Caesar. In this system, the plaintext and ciphertext sets are simply strings, and the secret key is some positive integer \(k\).
The idea of this cryptosystem, as well as the starting point of many others, is to associate characters with numbers, because we already have many operations and functions that work with numbers. In this example, we’ll first only consider messages that consist of uppercase letters and spaces, and associate each of these letters with a number as follows:
Character | Value | Character | Value |
---|---|---|---|
'A' |
0 |
'O' |
14 |
'B' |
1 |
'P' |
15 |
'C' |
2 |
'Q' |
16 |
'D' |
3 |
'R' |
17 |
'E' |
4 |
'S' |
18 |
'F' |
5 |
'T' |
19 |
'G' |
6 |
'U' |
20 |
'H' |
7 |
'V' |
21 |
'I' |
8 |
'W' |
22 |
'J' |
9 |
'X' |
23 |
'K' |
10 |
'Y' |
24 |
'L' |
11 |
'Z' |
25 |
'M' |
12 |
' ' |
26 |
'N' |
13 |
In Python, we can implement this conversion as follows:
= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ '
LETTERS
def letter_to_num(c: str) -> int:
"""Return the number that corresponds to the given letter.
Preconditions:
- len(c) == 1 and c in LETTERS
"""
return LETTERS.index(c)
def num_to_letter(n: int) -> str:
"""Return the letter that corresponds to the given number.
Precondtions:
- 0 <= n < len(LETTERS)
"""
return LETTERS[n]
In the Caesar cipher, the secret key \(k\) is an integer from the set \(\mathcal{K} = \{1, 2, \dots, 26\}\). Before sending any messages, Alice and Bob meet and decide on a secret key from this set.
Now when Alice wants to send a string message \(m\) to Bob, she encrypts her message as follows.
LETTERS
. Note that the
space character
comes after Z
.For example, if \(k = 3\), and the
plaintext message is 'HAPPY'
, encryption happens as
follows:
Plaintext character | Corresponding Integer | Shifted Integer | Ciphertext character |
---|---|---|---|
'H' |
7 |
10 |
'K' |
'A' |
0 |
3 |
'D' |
'P' |
15 |
18 |
'S' |
'P' |
15 |
18 |
'S' |
'Y' |
24 |
0 |
'A' |
The corresponding ciphertext is 'KDSSA'
. Note that the
Y
, when shifted by 3, wraps around to become
A
.
Then when Bob receives the ciphertext 'KDSSA'
, he
decrypts the ciphertext by applying the corresponding shift in reverse
(subtracting the secret key \(k\)
instead of adding it). We can implement this in Python as
follows: Note that we’ve dropped the _so_far
suffix on these accumulator variables now that you have more experience
writing loops!
def encrypt_caesar(k: int, plaintext: str) -> str:
"""Return the encrypted message using the Caesar cipher with key k.
Preconditions:
- all({x in LETTERS for x in plaintext})
- 1 <= k <= 26
"""
= ''
ciphertext
for letter in plaintext:
= ciphertext + num_to_letter((letter_to_num(letter) + k) % len(LETTERS))
ciphertext
return ciphertext
def decrypt_caesar(k: int, ciphertext: str) -> str:
"""Return the decrypted message using the Caesar cipher with key k.
Preconditions:
- all({x in LETTERS for x in ciphertext})
- 1 <= k <= 26
"""
= ''
plaintext
for letter in ciphertext:
= plaintext + num_to_letter((letter_to_num(letter) - k) % len(LETTERS))
plaintext
return plaintext
In our example above, we restricted ourselves to only upper-case letters and spaces. But the key mathematical idea of the Caesar cipher, shifting letters based on a secret key \(k\) used as an offset, generalizes to larger sets of letters.
To see how to do this, first we recall two built-in Python functions from 2.9 Application: Representing Text:
>>> ord('A') # Convert a character into an integer
65
>>> chr(33) # Convert an integer into a character
'!'
Using these two functions, we can modify our encrypt
and
decrypt
functions in the Caesar cipher to operate on
arbitrary Python strings. For simplicity, we’ll stick only to the first
128 characters, which are known as the ASCII
characters.You might recall from Section 2.9 that ASCII
is one of the earliest standard for encoding characters as natural
numbers on a computer. Our secret key will now take on values
from the set \(\{1, 2, \dots,
127\}\).
def encrypt_ascii(k: int, plaintext: str) -> str:
"""Return the encrypted message using the Caesar cipher with key k.
Preconditions:
- all({ord(c) < 128 for c in plaintext})
- 1 <= k <= 127
>>> encrypt_ascii(4, 'Good morning!')
'Kssh$qsvrmrk%'
"""
= ''
ciphertext
for letter in plaintext:
= ciphertext + chr((ord(letter) + k) % 128)
ciphertext
return ciphertext
def decrypt_ascii(k: int, ciphertext: str) -> str:
"""Return the decrypted message using the Caesar cipher with key k.
Preconditions:
- all({ord(c) < 128 for c in ciphertext})
- 1 <= k <= 127
>>> decrypt_ascii(4, 'Kssh$qsvrmrk%')
'Good morning!'
"""
= ''
plaintext
for letter in ciphertext:
+= chr((ord(letter) - k) % 128)
plaintext
return plaintext
So far, we’ve focused on the correctness of the Caesar cipher by implementing Python functions to encrypt and decrypt data using the cipher. But remember that a secure symmetric-key cryptosystem requires two properties: correctness and security!
(security) For all \(k \in \mathcal{K}\) and \(m \in \mathcal{P}\), if an eavesdropper only knows the value of \(c = Encrypt(k, m)\) but does not know \(k\), it is computationally infeasible to find \(m\).
In practice, the Caesar cipher is definitely not secure, as it is very possible for an eavesdropper to write a Python program to try all possible secret keys to decrypt a ciphertext, and pick out the most likely message that Alice sent. In other words, with only 27 (or even 127) possible keys, it is very “computationally feasible” for an eavesdropper to decrypt the messages.
The main takeaway from this example: it is usually not too hard to create (or implement) a correct cryptosystem, but it can be very challenging to create one that’s secure! Also, don’t use the Caesar cipher to transmit secret messages in real-world applications.