\( \newcommand{\NOT}{\neg} \newcommand{\AND}{\wedge} \newcommand{\OR}{\vee} \newcommand{\XOR}{\oplus} \newcommand{\IMP}{\Rightarrow} \newcommand{\IFF}{\Leftrightarrow} \newcommand{\TRUE}{\text{True}\xspace} \newcommand{\FALSE}{\text{False}\xspace} \newcommand{\IN}{\,{\in}\,} \newcommand{\NOTIN}{\,{\notin}\,} \newcommand{\TO}{\rightarrow} \newcommand{\DIV}{\mid} \newcommand{\NDIV}{\nmid} \newcommand{\MOD}[1]{\pmod{#1}} \newcommand{\MODS}[1]{\ (\text{mod}\ #1)} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cL}{\mathcal L} \newcommand{\cK}{\mathcal K} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cZ}{\mathcal Z} \newcommand{\emp}{\emptyset} \newcommand{\bs}{\backslash} \newcommand{\floor}[1]{\left \lfloor #1 \right \rfloor} \newcommand{\bigfloor}[1]{\Big \lfloor #1 \Big \rfloor} \newcommand{\ceil}[1]{\left \lceil #1 \right \rceil} \newcommand{\bigceil}[1]{\Big \lceil #1 \Big \rceil} \newcommand{\abs}[1]{\left | #1 \right |} \newcommand{\bigabs}[1]{\Big | #1 \Big |} \newcommand{\xspace}{} \newcommand{\proofheader}[1]{\underline{\textbf{#1}}} \)

CSC110 Fall 2025 Assignment 2

Note: Any FAQs or clarifications relevant to the assignment will be posted here. This post will be continually updated (with newer updates at the bottom of the page), so make sure to check on it regularly—click the “Watch” button in the top-right corner to receive notifications about this thread. If you have a question which is not already addressed on this page, create a new thread to post your question on Ed.

Logistics

Advice for Assignment 2

We strongly recommend taking a few minutes to do the “First impressions” tasks that we discussed in assignment 1:

  1. Skim the assignment handout.
  2. Download the starter files from MarkUs.
  3. Schedule time to work on the assignment.

This assignment covers (roughly) Chapters 4-6 of the Course Notes, and of course builds on the material in Chapters 1-3 as well.

Getting Started

To obtain the starter files for this assignment:

  1. Click this link to download the starter files for this assignment. This will download a zip file to your computer.
  2. Extract the contents of this zip file into your csc110/assignments/ folder.
  3. You should see a new a2 folder that contains the assignment’s starter files!

You can then see these files in your PyCharm csc110 project, and also upload the a2.tex file to Overleaf to complete your written work.

General instructions

This assignment contains a mixture of both written and programming questions. All of your written work should be completed in the a2.tex starter file using the LaTeX typesetting language. You went through the process of getting started with LaTeX in the Software Installation Guide. Overleaf also provides many tutorials to help you get started with LaTeX. See this tutorial for instructions on how to upload a .tex file to Overleaf.

Your programming work should be completed in the different starter files provided (each part has its own starter file). We have provided code at the bottom of each file for running doctest examples and PythonTA on each file. We are not grading doctests on this assignment, but encourage you to add some as a way to understand and test each function we’ve asked you to complete. We are using PythonTA to grade your work, so please run that on every Python file you submit using the code we’ve provided.

Warning: one of the purposes of this assignment is to evaluate your understanding and mastery of the concepts that we have covered so far. So on this assignment, you may only use parts of the Python programming language that we have covered in Chapters 4-6 of the Course Notes. Other parts (e.g., classes, recursion) are not allowed, and parts of your submissions that use them may receive a grade as low as zero for doing so.

Part 1: Proofs

Note: this part is a written response, and should be completed entirely in a2.tex.

For the proofs in this part, you may not use other theorems in your proofs unless they are proven in the course notes (in which case you should cite it) or you prove it yourself.

Consider the following two statements in predicate logic:

  1. Suppose \(D = \mathbb{Z^+}\), the set of all positive integers.

    1. Prove that Statement 1 is True.

    2. Prove that Statement 2 is False.

  2. Find a non-empty set of numbers \(D\) that makes Statement 1 False, and then prove that it is False.

  3. Find a non-empty set of numbers \(D\) that makes Statement 2 True, and then prove that it is True.

  4. Prove that for all integers \(a\) and \(b\), if both are greater than 1 then \(ab\) is not prime.

  5. Prove that for all integers \(a\), \(a^2 + a\) is divisible by 2. For this question, you must use the Quotient-Remainder Theorem to split up your proof into two cases based on the remainder when \(a\) is divided by 2.

Part 2: Loops and Mutation Debugging Exercise

A movie review typically includes a critic’s comments and a score, but the score doesn’t portray the sentiment and raw emotion in the critic’s comments. Paul and Mario have decided to build their own movie review platform that labels reviews as positive, negative, or neutral based on the intensity of the words used in the review.

They use the VADER Lexicon, which we represent as a mapping from words to a positive/negative intensity score. For example, here are two words and their intensities:

Word Intensity
awesome 3.1
awful -2.0

The polarity of a review is one of {'positive', 'negative', 'neutral'}, and is calculated by finding the average intensity of the lexicon words used in the review. Words that don’t appear in the VADER Lexicon are ignored when calculating the average.

Polarity Average intensity
positive \(\geq 0.05\)
neutral \(-0.05\) to \(0.05\), exclusive
negative \(\leq -0.05\)

In a2_part2.py, Paul and Mario have written a small program to read review data from a csv file, extract the lexicon words from the review, and then compute the review’s average intensity and polarity. They have painstakingly gone through three reviews and manually calculated their polarity and average intensity. Unfortunately, when they compare the calculated values to the values computed by the program, Paul and Mario find that their program has some errors!

Answer the following questions about this program and pytest report. Write your responses in a2.tex, except for 2(b), which you will complete in a2_part2.py directly.

  1. Run a2_part2.py to generate a pytest report. Note that we’ve already provided the code for running pytest in the main block at the bottom of the file.

    Based on the report, state which tests, if any, passed, and which tests failed. Just state the name of the tests and whether they passed/failed, and do not give explanations.

  2. For each failing test from the pytest report:

    1. Explain what error is causing the test to fail.

      Hint: each test refers to a data file under datasets/reviews. The last column of each csv file stores the text of the review.

    2. Edit a2_part2.py by fixing the function code so that all tests pass. Note that the tests themselves do not contain errors.

      The changes should be small and must be to fix errors only; the original purpose of all functions and tests must remain the same. The expected value in the tests is correct, do not change it.

  3. For each test that passed on the original code (before your changes in Question 2), explain why that test passed even though there were errors in the Python file.

Part 3: Testing

For this question, first read Chapter 6.7 of the course notes, which we didn’t cover in class.

Now that we’ve discussed mutation, it is wise to make sure that we mutate objects when we should, and that we do not mutate objects when we shouldn’t. Whether an argument is mutated should be made clear from the docstring description. For an example, compare the docstring description of squares with the description for square_all in Chapter 6.7 of the course notes.

Two data classes and four functions are defined in a2_part3.py. Notice that the functions have no function body. While you may implement these functions if you wish, the file a2_part3.py is not submitted for grading. Instead, open test_a2_part3.py. Remember that testing functions should respect the preconditions of functions and the representation invariants of data classes. Complete the following steps:

  1. Write the two property-based tests to validate that the functions (deposit and withdraw) are mutating the expected arguments appropriately.
  2. Write the two unit tests that end in _correctness to validate that the functions (summarize_transactions and last_deposit) are returning the correct values.
  3. Write the two unit tests that end in _no_mutation to validate that the functions (summarize_transactions and last_deposit) are not mutating their arguments.

In these tasks, there will be situations where you are asserting that two float values are “the same”. As we know, floats are approximations of real numbers. And in our doctest examples in the past, we have used math.isclose to demonstrate functions that return float values. However, math is not imported in test_a2_part3. Luckily, the pytest module is and has a convenient function for comparing floats:

>>> 4.93 == 4.9299999
False
>>> 4.93 == pytest.approx(4.9299999)
True
>>> 4.93 == pytest.approx(4.929)
False

You should not specify any of the optional arguments in pytest.approx (e.g., rel, or abs). You should use pytest.approx whenever you are comparing the equivalence of two float values.

When we assess your test cases, we will not use the provided a2_part3.py file. Instead, we will try multiple different files that are identical to a2_part3.py except they will have function bodies. The function bodies will contain different types of bugs (or may be correct) that correspond to what the test cases in test_a2_part3.py should catch (i.e., a test case should FAIL for a buggy implementation). Your goal is for the test cases you write to catch these bugs.

Part 4: Tabular data

A significant source of frustration to the residents of Toronto are delays in public transit. Admittedly, adding time to your commute can take a negative toll on just about anyone who commutes. Some articles and books claim that a short commute time will improve your happiness. One article goes so far as linking the misery of additional commute time to a corresponding pay cut. In this exploration, you will work with data on subway delays provided by the Toronto Transit Commission (TTC), the organization that runs Toronto public transit.

0. The data set

You should see the file ttc-subway-delays.csv in your a2 folder, as it was included with your starter code. This file contains a record of all TTC delays in the time period from January 1, 2014 to October 31, 2019, courtesy of the City of Toronto. After completing this assignment, you could use your code to analyze the newer data available on the City of Toronto site, for an interesting side project!

The data is stored using the comma-separated values (csv) file format, the same format we saw in class. For example, in our sample data the first four lines look like this:

Date,Time,Day,Station,Code,Min Delay,Min Gap,Bound,Line,Vehicle
01/01/2014,00:21,Wednesday,VICTORIA PARK STATION,MUPR1,55,60,W,BD,5111
01/01/2014,02:06,Wednesday,HIGH PARK STATION,SUDP,3,7,W,BD,5001
01/01/2014,02:40,Wednesday,SHEPPARD STATION,MUNCA,0,0,,YU,0

and they represent the following tabular data:

Date Time Day Station Code Min Delay Min Gap Bound Line Vehicle
01/01/2014 0:21 Wednesday VICTORIA PARK STATION MUPR1 55 60 W BD 5111
01/01/2014 2:06 Wednesday HIGH PARK STATION SUDP 3 7 W BD 5001
01/01/2014 2:40 Wednesday SHEPPARD STATION MUNCA 0 0 YU 0

Here is a description and expected Python data types of the columns in this data set.

Column name Description Python data type
Date The date of the delay datetime.date
Time The time of the delay datetime.time
Day The day of the week on which the delay occurred. str
Station The name of the subway station where the delay occurred. str
Code The TTC delay code, which usually describes the cause of the delay. You can find a table showing the codes and descriptions in ttc-subway-delay-codes.csv, which was also included in the starter code. str
Min Delay The length of the subway delay (in minutes). int
Min Gap The length of time between subway trains (in minutes). int
Bound The direction in which the train was travelling. This is dependent on the line the train was on. str
Line The abbreviated name of the subway line where the delay occurred. str
Vehicle The id number of the train on which the delay occurred. int

1. Reading the file

Your first task is to take the ttc-subway-delays.csv and load the data in Python in the same way we did this in lecture. Complete the read_csv_file function (along with its helper functions, which we describe below), which returns a tuple with two elements, the first representing the header, and the second representing the remaining rows of data.

Using tuples

While we’ve discussed tuples several times during lecture, we have not had much practice using tuples until now. In this part of the assignment, many of the functions you write will use tuples. Tuples are similar to lists, and can be indexed using [], but are an immutable data type, supporting no mutating methods. Unlike lists, tuples have the benefit of being able to specify the types of each of its elements even for a heterogeneous collection, as you can see in the function headers in the starter code.

To write a tuple literal, use parentheses along with commas, for example (1, 'hi') for a tuple of type tuple[int, str]. Sometimes, the parentheses can be omitted, like in the given code for part 2, and PyCharm will notify you when this is the case. A tuple with one element is written with an extra comma since parentheses on their own are used for precedence. For example, to write a value of type tuple[int] we write (1,).

Use a csv.reader object that we saw in class to read rows of data from the file. Recall that this object turns every row into a list of strings. However, in order to do useful computations on this data, we’ll need to convert many of these entries into other Python data types, like int and datetime.date. Implement the helper function process_row—and its helper functions str_to_date and str_to_time—to process a single row of data to convert the entries into their appropriate data types (specified in the table above).

2. Operating on the data

Now that we have this csv data stored as a nested list in Python, we can do some analysis on it! Complete the functions below to answer some questions about this data.

Coding requirements:

For this question, we will start off with our “older” tools of comprehensions and lists of data. Other features (e.g., loops) are not allowed, and parts of your submissions that use them may receive a grade as low as zero for doing so. We will use different features in Part 5 when we revisit these functions.

  1. What was the longest subway delay? (longest_delay)
  2. On average, how long do the subway delays last? (average_delay) For this question, we consider all delays in the data, even delays that have a “Min Delay” attribute of 0 minutes.
  3. How many subway delays were there in a specifc month, like July 2018? (num_delays_by_month)

Part 5: Tabular Data Revisited

Now we will revisit our code in Part 4 and use the tools we learned about later in the course: data classes and for loops. You may want to (and are allowed to) reuse your Part 4 code here.

1. Adding a data class

Your first task is to design and implement the new data class Delay, which represents a single row of the table. This is very similar to what we did for the marriage license data set in lecture.

2. Reading the file

Next, complete the read_csv_file function and its helpers, like the analogous functions in Part 4, but using the new Delay data class.

3. Operating on the data

Coding requirements:

For this question, we will practice using for loops instead of comprehensions. In addition to this requirement, do not use any built-in aggregation functions (like sum or len or max). Like before, parts of your submissions that use these features may receive a grade as low as zero for doing so.

All of your loops should follow the loop accumulator pattern from lecture:

<x>_so_far = <default_value>

for element in <collection>:
    <x>_so_far = ... <x>_so_far ... element ...  # Somehow combine loop variable and accumulator

return <x>_so_far

Finally, complete the functions longest_delay, average_delay, and num_delays_by_month, which are equivalent to the functions you completed for Part 4, except they now take in a list[Delay] rather than a list[list] to represent the tabular data. Note that because we have the more specific type annotation list[Delay], we no longer need the preconditions in Part 4 saying that the inner lists have the right structure!

Submission instructions

Please proofread and test your work carefully before your final submission! As we explain in Requirements for programming components, it is essential that your submitted code not contain syntax errors. Python files that contain syntax errors will receive a grade of 0 on all automated testing components (though they may receive partial or full credit on any TA grading for assignments). You have lots of time to work on this assignment and check your work (and right-click -> “Run in Python Console”), so please make sure to do this regularly and fix syntax errors right away.

  1. Login to MarkUs.

  2. Go to Assignment 2, then the “Submissions” tab.

  3. Submit the following files: a2.tex, a2.pdf (which must be generated from your a2.tex file), a2_part2.py, test_a2_part3.py, a2_part4.py, a2_part5.py, and honour_code.txt. Please note that MarkUs is picky with filenames, and so your filenames must match these exactly, including using lowercase letters.

  4. Refresh the page, and then download each file to make sure you submitted the right version.

Remember, you can submit your files multiple times before the due date. So you can aim to submit your work early, and if you find an error or a place to improve before the due date, you can still make your changes and resubmit your work.

After you’ve submitted your work, please give yourself a well-deserved pat on the back and go take a rest or do something fun or enjoy nature or look at some art!

Richard Diebenkorn. Women Outside, 1957. Gallery of Ontario.