Note: Any FAQs or clarifications relevant to the assignment will be posted here. This post will be continually updated (with newer updates at the bottom of the page), so make sure to check on it regularly—click the “Watch” button in the top-right corner to receive notifications about this thread. If you have a question which is not already addressed on this page, create a new thread to post your question on Ed.
honour_code.txt
file included in the starter files on
MarkUs (more below) which gives an overview of common mistakes students
make in regards to academic integrity.We strongly recommend taking a few minutes to do the “First impressions” tasks that we discussed in assignment 1:
This assignment covers (roughly) Chapters 4-6 of the Course Notes, and of course builds on the material in Chapters 1-3 as well.
To obtain the starter files for this assignment:
csc110/assignments/
folder.a2
folder that contains the
assignment’s starter files!You can then see these files in your PyCharm csc110
project, and also upload the a2.tex
file to Overleaf to
complete your written work.
This assignment contains a mixture of both written and programming
questions. All of your written work should be completed in the
a2.tex
starter file using the LaTeX typesetting language.
You went through the process of getting started with LaTeX in the Software Installation
Guide. Overleaf also provides many tutorials
to help you get started with LaTeX. See this
tutorial for instructions on how to upload a .tex
file
to Overleaf.
Your programming work should be completed in the different starter files provided (each part has its own starter file). We have provided code at the bottom of each file for running doctest examples and PythonTA on each file. We are not grading doctests on this assignment, but encourage you to add some as a way to understand and test each function we’ve asked you to complete. We are using PythonTA to grade your work, so please run that on every Python file you submit using the code we’ve provided.
Warning: one of the purposes of this assignment is to evaluate your understanding and mastery of the concepts that we have covered so far. So on this assignment, you may only use parts of the Python programming language that we have covered in Chapters 4-6 of the Course Notes. Other parts (e.g., classes, recursion) are not allowed, and parts of your submissions that use them may receive a grade as low as zero for doing so.
Note: this part is a written response, and should be
completed entirely in a2.tex
.
For the proofs in this part, you may not use other theorems in your proofs unless they are proven in the course notes (in which case you should cite it) or you prove it yourself.
Consider the following two statements in predicate logic:
Suppose \(D = \mathbb{Z^+}\), the set of all positive integers.
Prove that Statement 1 is True.
Prove that Statement 2 is False.
Find a non-empty set of numbers \(D\) that makes Statement 1 False, and then prove that it is False.
Find a non-empty set of numbers \(D\) that makes Statement 2 True, and then prove that it is True.
Prove that for all integers \(a\) and \(b\), if both are greater than 1 then \(ab\) is not prime.
Prove that for all integers \(a\), \(a^2 + a\) is divisible by 2. For this question, you must use the Quotient-Remainder Theorem to split up your proof into two cases based on the remainder when \(a\) is divided by 2.
A movie review typically includes a critic’s comments and a score, but the score doesn’t portray the sentiment and raw emotion in the critic’s comments. Paul and Mario have decided to build their own movie review platform that labels reviews as positive, negative, or neutral based on the intensity of the words used in the review.
They use the VADER Lexicon, which we represent as a mapping from words to a positive/negative intensity score. For example, here are two words and their intensities:
Word | Intensity |
---|---|
awesome | 3.1 |
awful | -2.0 |
The polarity of a review is one of
{'positive', 'negative', 'neutral'}
, and is calculated by
finding the average intensity of the lexicon words used in the review.
Words that don’t appear in the VADER Lexicon are ignored when
calculating the average.
Polarity | Average intensity |
---|---|
positive |
\(\geq 0.05\) |
neutral |
\(-0.05\) to \(0.05\), exclusive |
negative |
\(\leq -0.05\) |
In a2_part2.py
, Paul and Mario have written a small
program to read review data from a csv file, extract the lexicon words
from the review, and then compute the review’s average intensity and
polarity. They have painstakingly gone through three reviews and
manually calculated their polarity and average intensity. Unfortunately,
when they compare the calculated values to the values computed by the
program, Paul and Mario find that their program has some errors!
Answer the following questions about this program and
pytest
report. Write your responses in a2.tex
,
except for 2(b), which you will complete in a2_part2.py
directly.
Run a2_part2.py
to generate a pytest
report. Note that we’ve already provided the code for running
pytest
in the main block at the bottom of the file.
Based on the report, state which tests, if any, passed, and which tests failed. Just state the name of the tests and whether they passed/failed, and do not give explanations.
For each failing test from the pytest
report:
Explain what error is causing the test to fail.
Hint: each test refers to a data file under
datasets/reviews
. The last column of each csv file stores
the text of the review.
Edit a2_part2.py
by fixing the function code so that
all tests pass. Note that the tests themselves do not contain
errors.
The changes should be small and must be to fix errors only; the original purpose of all functions and tests must remain the same. The expected value in the tests is correct, do not change it.
For each test that passed on the original code (before your changes in Question 2), explain why that test passed even though there were errors in the Python file.
For this question, first read Chapter 6.7 of the course notes, which we didn’t cover in class.
Now that we’ve discussed mutation, it is wise to make sure that we
mutate objects when we should, and that we do not mutate objects when we
shouldn’t. Whether an argument is mutated should be made clear from the
docstring description. For an example, compare the docstring description
of squares
with the description for square_all
in Chapter 6.7 of the course notes.
Two data classes and four functions are defined in
a2_part3.py
. Notice that the functions have no function
body. While you may implement these functions if you wish, the file
a2_part3.py
is not submitted for grading. Instead,
open test_a2_part3.py
. Remember that testing functions
should respect the preconditions of functions and the representation
invariants of data classes. Complete the following steps:
deposit
and withdraw
) are mutating the
expected arguments appropriately._correctness
to
validate that the functions (summarize_transactions
and
last_deposit
) are returning the correct values._no_mutation
to
validate that the functions (summarize_transactions
and
last_deposit
) are not mutating their
arguments.In these tasks, there will be situations where you are asserting that
two float values are “the same”. As we know, float
s are
approximations of real numbers. And in our doctest
examples
in the past, we have used math.isclose
to demonstrate
functions that return float
values. However,
math
is not imported in test_a2_part3
.
Luckily, the pytest
module is and has a convenient function
for comparing floats:
>>> 4.93 == 4.9299999
False
>>> 4.93 == pytest.approx(4.9299999)
True
>>> 4.93 == pytest.approx(4.929)
False
You should not specify any of the optional arguments in
pytest.approx
(e.g., rel
, or
abs
). You should use pytest.approx
whenever
you are comparing the equivalence of two float
values.
When we assess your test cases, we will not use the provided
a2_part3.py
file. Instead, we will try multiple different
files that are identical to a2_part3.py
except they will
have function bodies. The function bodies will contain different types
of bugs (or may be correct) that correspond to what the test cases in
test_a2_part3.py
should catch (i.e., a test case should
FAIL for a buggy implementation). Your goal is for the test cases you
write to catch these bugs.
A significant source of frustration to the residents of Toronto are delays in public transit. Admittedly, adding time to your commute can take a negative toll on just about anyone who commutes. Some articles and books claim that a short commute time will improve your happiness. One article goes so far as linking the misery of additional commute time to a corresponding pay cut. In this exploration, you will work with data on subway delays provided by the Toronto Transit Commission (TTC), the organization that runs Toronto public transit.
You should see the file ttc-subway-delays.csv
in your
a2
folder, as it was included with your starter code. This
file contains a record of all TTC delays in the time period from January
1, 2014 to October 31, 2019, courtesy of the City of
Toronto. After completing this assignment, you could use your code
to analyze the newer data available on the City of Toronto
site, for an interesting side project!
The data is stored using the comma-separated values (csv) file format, the same format we saw in class. For example, in our sample data the first four lines look like this:
Date,Time,Day,Station,Code,Min Delay,Min Gap,Bound,Line,Vehicle
01/01/2014,00:21,Wednesday,VICTORIA PARK STATION,MUPR1,55,60,W,BD,5111
01/01/2014,02:06,Wednesday,HIGH PARK STATION,SUDP,3,7,W,BD,5001
01/01/2014,02:40,Wednesday,SHEPPARD STATION,MUNCA,0,0,,YU,0
and they represent the following tabular data:
Date | Time | Day | Station | Code | Min Delay | Min Gap | Bound | Line | Vehicle |
---|---|---|---|---|---|---|---|---|---|
01/01/2014 | 0:21 | Wednesday | VICTORIA PARK STATION | MUPR1 | 55 | 60 | W | BD | 5111 |
01/01/2014 | 2:06 | Wednesday | HIGH PARK STATION | SUDP | 3 | 7 | W | BD | 5001 |
01/01/2014 | 2:40 | Wednesday | SHEPPARD STATION | MUNCA | 0 | 0 | YU | 0 |
Here is a description and expected Python data types of the columns in this data set.
Column name | Description | Python data type |
---|---|---|
Date | The date of the delay | datetime.date |
Time | The time of the delay | datetime.time |
Day | The day of the week on which the delay occurred. | str |
Station | The name of the subway station where the delay occurred. | str |
Code | The TTC delay code, which usually describes the cause of the delay.
You can find a table showing the codes and descriptions in
ttc-subway-delay-codes.csv , which was also included in the
starter code. |
str |
Min Delay | The length of the subway delay (in minutes). | int |
Min Gap | The length of time between subway trains (in minutes). | int |
Bound | The direction in which the train was travelling. This is dependent on the line the train was on. | str |
Line | The abbreviated name of the subway line where the delay occurred. | str |
Vehicle | The id number of the train on which the delay occurred. | int |
Your first task is to take the ttc-subway-delays.csv
and
load the data in Python in the same way we did this in lecture. Complete
the read_csv_file
function (along with its helper
functions, which we describe below), which returns a tuple with two
elements, the first representing the header, and the second representing
the remaining rows of data.
While we’ve discussed tuples several times during lecture, we have
not had much practice using tuples until now. In this part of the
assignment, many of the functions you write will use tuples. Tuples are
similar to lists, and can be indexed using []
, but are an
immutable data type, supporting no mutating methods. Unlike lists,
tuples have the benefit of being able to specify the types of each of
its elements even for a heterogeneous collection, as you can see in the
function headers in the starter code.
To write a tuple literal, use parentheses along with commas, for
example (1, 'hi')
for a tuple of type
tuple[int, str]
. Sometimes, the parentheses can be omitted,
like in the given code for part 2, and PyCharm will notify you when this
is the case. A tuple with one element is written with an extra comma
since parentheses on their own are used for precedence. For example, to
write a value of type tuple[int]
we write
(1,)
.
Use a csv.reader
object that we saw in class to read
rows of data from the file. Recall that this object turns every row into
a list of strings. However, in order to do useful computations on this
data, we’ll need to convert many of these entries into other Python data
types, like int
and datetime.date
. Implement
the helper function process_row
—and its helper functions
str_to_date
and str_to_time
—to process a
single row of data to convert the entries into their appropriate data
types (specified in the table above).
Now that we have this csv data stored as a nested list in Python, we can do some analysis on it! Complete the functions below to answer some questions about this data.
Coding requirements:
For this question, we will start off with our “older” tools of comprehensions and lists of data. Other features (e.g., loops) are not allowed, and parts of your submissions that use them may receive a grade as low as zero for doing so. We will use different features in Part 5 when we revisit these functions.
longest_delay
)average_delay
) For this question, we consider all delays
in the data, even delays that have a “Min Delay” attribute of 0
minutes.num_delays_by_month
)Now we will revisit our code in Part 4 and use the tools we learned about later in the course: data classes and for loops. You may want to (and are allowed to) reuse your Part 4 code here.
Your first task is to design and implement the new data class
Delay
, which represents a single row of the table. This is
very similar to what we did for the marriage license data set in
lecture.
Next, complete the read_csv_file
function and its
helpers, like the analogous functions in Part 4, but using the new
Delay
data class.
Coding requirements:
For this question, we will practice using for loops instead of
comprehensions. In addition to this requirement, do not use any built-in
aggregation functions (like sum
or len
or
max
). Like before, parts of your submissions that use these
features may receive a grade as low as zero for doing
so.
All of your loops should follow the loop accumulator pattern from lecture:
<x>_so_far = <default_value>
for element in <collection>:
<x>_so_far = ... <x>_so_far ... element ... # Somehow combine loop variable and accumulator
return <x>_so_far
Finally, complete the functions longest_delay
,
average_delay
, and num_delays_by_month
, which
are equivalent to the functions you completed for Part 4, except they
now take in a list[Delay]
rather than a
list[list]
to represent the tabular data. Note that because
we have the more specific type annotation list[Delay]
, we
no longer need the preconditions in Part 4 saying that the inner lists
have the right structure!
Please proofread and test your work carefully before your final submission! As we explain in Requirements for programming components, it is essential that your submitted code not contain syntax errors. Python files that contain syntax errors will receive a grade of 0 on all automated testing components (though they may receive partial or full credit on any TA grading for assignments). You have lots of time to work on this assignment and check your work (and right-click -> “Run in Python Console”), so please make sure to do this regularly and fix syntax errors right away.
Login to MarkUs.
Go to Assignment 2, then the “Submissions” tab.
Submit the following files: a2.tex
,
a2.pdf
(which must be generated from your
a2.tex
file), a2_part2.py
,
test_a2_part3.py
, a2_part4.py
,
a2_part5.py
, and honour_code.txt
. Please note
that MarkUs is picky with filenames, and so your filenames must match
these exactly, including using lowercase letters.
a2.tex
and a2.pdf
files from
Overleaf.Refresh the page, and then download each file to make sure you submitted the right version.
Remember, you can submit your files multiple times before the due date. So you can aim to submit your work early, and if you find an error or a place to improve before the due date, you can still make your changes and resubmit your work.
After you’ve submitted your work, please give yourself a well-deserved pat on the back and go take a rest or do something fun or enjoy nature or look at some art!