Assignment 1, Looking Out for Number One?

Due: 2021 Sep 23. (By the end of the day.)
Submit early by 2021 Sep 16 for a chance to earn up to 105%.
Submit late up to 2021 Sep 27 for up to 75%.

Also see the PrairieLearn page for this assignment.

Table of Contents

The natural world is full of hidden and beautiful mathematics. The whorls of a conch shell hide the Fibonacci sequence and its Golden Ratio, plants grow in fractal patterns, and comets trace hyperbolic patterns through the solar system. All those beautiful patterns hide in the grungy data of human observation. So, what are the populations of every town in Quebec and the number of posts by various authors to a hockey discussion forum hiding from you?

For this assignment, you will take a data source of integers and produce a frequency histogram of first digits of those numbers. Perhaps there will be some beautiful and surprising patterns in those first digits?

Briefly, here’s a summary of what you must design in your submission:

Here are some useful files for this project:

Lib.hs
A starting point file for you to work with. Essentially all the code below. Note: if you run the Haskell program doctest on the file, it will run the tests that start with >>> scattered through the code below. And, of course, you can (should) write new ones!
Enrollment Data
A modest-sized set of input you could use for testing. These are Ancient Enrollment Data from the Dawn of Time OK, the Dawn of the Millenium Before Most CPSC 312 Students Were Born Shortly After Steve Graduated, Darn It!
Daily Covid Vaccinations
A data set just large enough to hit our 250-data-points threshold of the number of daily doses of COVID vaccines administered in BC over time.

1 Your Data

Go find some data. It needs to be measurements (in consistent units) of something natural. “Natural” data could be the flow rate (in cubic meters per second or gallons per minute, as long as it’s consistent) of all the streams and rivers in BC, the file size of every file on your computer, or the daily number of vaccinations reported in BC. (Well, not that last because we’re providing you that one!) You must have the legal right to share the data with us! We won’t share it farther without asking you, however.

Your data must be a single text file of the following form:

Your chosen file must have at least 250 data points in it. It’s probably better if it doesn’t have more than a few tens of thousands of data points, just so we’re not getting too much data!

2 Finding the First Digit

At some point, you’ll have your hands on a single number from the data, and you’ll want to get just its frist digit. For that, you’ll design a function getFirstDigit :: Integer -> Integer that takes an integer and produces only the first digit of that integer. It should work for any integer and return 0 only for the number 0.

You should feel free to either solve the problem with a recursive function that repeatedly divides the number by 10 (for which guards may be useful) or one that uses the built-in function show to transform the Integer into a String. Trying it out both ways would be great practice!

The latter solution is slick but tricky. If you take it on, bear in mind: 1. The type of show may seem strange. For now, you can think of it as show :: Integer -> String. 2. You’ll need to convert a Char to a number. Look up fromEnum and toEnum and play around with them! 3. Sadly, the previous point will get you to an Int, not the Integer you need. Fortunately, you can convert from any Integral type (like Int) to any other numeric (Num) type using fromIntegral :: (Integral a, Num b) => a -> b. That types means it can convert from anything that fulfills the Integral type class to anything that fulfills the Num typeclass. Both Int and Num fulfill both type classes. For now, maybe think of fromIntegral as a Int -> Integer.

Finally, with either approach, be sure to add more good examples/test cases than we provide below! One set of inputs is very slightly trickier than the rest.

-- | @getFirstDigit n@ evaluates to the first digit of @n@, which is
-- in 0 for the number 0 and otherwise one of [1..9]. TODO: add the
-- critical missing type of example below and discuss it here.
--
-- >>> getFirstDigit 6
-- 6
--
-- >>> getFirstDigit 112
-- 1
getFirstDigit :: Integer -> Integer
getFirstDigit = undefined    -- TODO!!!

3 Getting all the first digits

Now that we can find the first digit of one integer, we need the first digits of all the integers from a list, which we’ll get with getFirstDigits.

The list type has two cases: an empty list [] and a list with a head element and the tail of the list (x:xs). Make a recursive implementation that also breaks down into these cases to solve the problem. You should absolutely be calling ‘getFirstDigit’!

We urge you to design this as a recursive function. Having done that, if you want to go back and use the map function in Haskell (which is very similar to the map discussed in CPSC 110), that’s a great idea!

-- | @getFirstDigits ns@ evaluates to a list of the first digits of
-- @ns@, using 'getFirstDigit', described above.
--
-- >>> getFirstDigits []
-- []
--
-- >>> getFirstDigits [15, 0, -4, 728]
-- [1,0,4,7]
getFirstDigits :: [Integer] -> [Integer]
getFirstDigits = undefined    -- TODO!!!

4 Count the first digits

Now, we want to count up how many there are of each first digit.

One straightforward, if wordy, way we can solve this problem is using a list of length 9. Each entry in the list will represent the count for a single digit (1 through 9, ignoring 0).

Design a function countDigits that takes a list of integers and returns a count of how often the digits 1 through 9 appear in the list as a list of length 9.

You must use three helper functions:

As usual, there are many approaches you can take here. For your learning, we recommend starting with a recursive implementation for addLists. (Be careful: how many base cases are there?) If you’re excited about learning more, you could then try to reimplement it using zipWith.

For digitToCount the first time around, we recommend the wordy but simple solution of having 9 concrete cases to handle the numbers 1 through 9 and a single catch-all case for everything else. (With copy-and-paste, it’s quick to create.) You could then go back and rewrite it using replicate (and the ++ list append operator).

For countDigits itself, start with a recursive solution. Then, if you’d like to learn more, try going back to use map and foldr.1

Note that we haven’t given a starting point for addLists or digitToCount below. You’re in charge of creating the signature and definition for those on your own.

-- | @countDigits ns@ returns a list of 9 numbers representing how
-- often each digit [1..9] occurs as the first digit of a number in `ns`.
countDigits :: [Integer] -> [Integer]
countDigits = undefined    -- TODO!!!

5 Generate a bar chart

At this point, we’ve planned out how to get from a list of numbers to a count of how often each digit 1 to 9 appears as the first digit. Now let’s display these as a chart. We’ll keep it simple and make a text bar chart.

In this section, design a function that takes a display character and a list of integers and generates a list of strings representing an attractive (?) bar chart with horizontal bars each of length of the corresponding integer, suitable for printing.

You’ll want some helper functions, but it’s up to you what they are this time!

-- | @barChart c ns@ produces a list of one string per number @n@ in 
-- @ns@. Each string is as long as the corresponding number @n@ and
-- made up of copies of @c@. (The string is empty if @n < 0@.)
--
-- >>> barChart '#' []
-- []
--
-- >>> barChart '#' [3, 0, 5, 1]
-- ["###","","#####","#"]
--
-- >>> barChart '-' [2]
-- ["--"]
barChart :: Char -> [Integer] -> [String]
barChart = undefined    -- TODO!!!

6 Put it all together

Now, put it all together into a function genBenChart that takes a list of Integers and produces a bar chart with 9 bars, one each for the number of first digits that are 1s, 2s, 3s, …, 9s in the input list. genBenChart should call many of your functions above.

(Why should this be called genBenChart? Who’s Ben? Let’s talk about that after the assignment due date.)

-- | @genBenChart ns@ produces a list of strings that is a bar
-- chart of '#' characters with 9 bars showing the frequency of 
-- first digits 1 to 9 in @ns@.
--
-- >>> genBenChart [22, 7, 13, 18, 3, 6, 35]
-- ["##","#","##","","","#","#","",""]
genBenChart :: [Integer] -> [String]
genBenChart = undefined    -- TODO!!!

(For your own experimentation, you may want to make a version of genBenChart that scales the frequencies to integer percentages. So, it totals the 9 frequency counts and then produces a new list of frequency counts by multiplying each frequency by 100 and dividing it by that total. That displays better on a modestly wide screen! The div function performs integer division2.)

7 Parse the input string

Design a function extractData that takes an input string containing a title on the first line, followed by zero or more lines of data and evaluates to a tuple of the title and a list of the data as Integers.

The string passed as the argument should be in exactly the format described for your file in Your Data above. (Indeed, the main program below simply reads in the file contents as a string and hands them off to your function for processing.) You may assume the input follows exactly this format. We’ll learn more about how to manage errors in the format another day!

You will certainly want helper functions, and you will likely want to use the built-in functions: lines, words, head and tail (although pattern-matching may be more convenient!), and read. read is a little strange. Try both read "12" and read "12" :: Integer at the GHCi prompt to get a sense of how strange. However, for our purposes, we can simply imagine it has the type String -> Integer. Indeed, you could even define a new function:

readInteger :: String -> Integer
readInteger = read

Here’s a starting point for extractData

-- | @extractData s@ requires that @s@ be exactly: a single line (of 
-- any text), followed by zero or more lines of data. A line of data
-- starts with an integer followed optionally by a space and any
-- additional text.
--
-- It evaluates to a tuple of the text of the first line and a list
-- of the integers from the subsequent lines.
--
-- >>> extractData "\n"
-- ("",[])
--
-- >>> extractData "Test\n12\n1 is a small number!\n345 45 5\n"
-- ("Test",[12,1,345])
--
-- >>> extractData "123\n-9\n"
-- ("123",[-9])
extractData :: String -> (String, [Integer])
extractData = undefined    -- TODO!!!

8 The main Program

You are not in charge of this section. Haskell handles input and output within a pure functional programming language by constructing a value representing a sort of computation plan that results in a particular type of value. An IO String, for example, is a computation plan that may involve input and output (or other side effects) and ends up with a String value.

An IO String is not a String and cannot be used like one. (That way the Haskell type system ensures you do not accidentally rely on input, output, or other side effects where you didn’t intend to!) However, Haskell provides principled ways to operate on values from within these IO x types as if they were plain x types (for whatever x you’re working on) while still ensuring that the result stays inside IO.

Finally, the ghc runtime itself actually executes that input/output plan in order to act in the world when it runs your main function.

The >>= operator assembles computation plans. Specifically, it takes an IO a on the left and an a -> IO b function on the right. The first is a computation plan (that, as usual for IO, may involve side effects) that produces an a. The second is a function that takes an a and produces a computation plan that produces a b. It assembles them into a single computation plan that does all the work described by the IO a and the IO b produced when the function operates on the a. In other words, >>= lets you assemble complex computation plans from individual pieces, where each piece can rely on the results of previous pieces. (>>= is actually more general than that, but this is a decent description for how it works in many cases.)

pure takes a plain old (pure) value and turns it into one of these “values in a computation”. So, e.g., pure 3 is a trivial computation plan that doesn’t do anything but produce 3. We use it just so we can intersperse regular stuff inside the >>= stuff. The . operator chains two functions together so that (f . g) x is exactly f (g x). (If you’ve taken CPSC 121, it’s the “function composition” you hopefully talked about there, if briefly!)

So, all in all, the code below assembles a computation plan that:

(There are some fun ways we could rewrite this function, but it’s also fine just as it is.)

-- | @main@ reads from standard input (and expects data in 
-- the format described for 'extractData' above), and prints
-- a bar chart showing frequencies of first digits in the data
-- to standard output.
main :: IO ()
main = 
  getContents >>= 
    pure . extractData >>=
    pure . (\(title,numLines) -> 
      unlines (title : genBenChart numLines)) >>=
    putStr

9 Bonus Point Opportunities

As always, you can earn Bonus Points for contributing to learning in the course! We’ll award bonus points for great contributions when we see them.

We’re also offering two optional opportunities for bonus points on this assignment:


  1. In fact, if you really wanted to go wild in Haskell, you could read about Monoid and newtype (as in its use with Sum and Product), and then create an alternate list Monoid such that a list of any Monoid is itself a Monoid, at which point you could use mconcat to replace foldr above. That’s deep and cool but likely way beyond where we’ll get to this term!↩︎

  2. This used to (incorrectly!) say addition. Thanks for flagging that to us!↩︎