Assignment 1, Looking Out for Number One?
Due: 2021 Sep 23. (By the end of the day.)
Submit early by 2021 Sep 16 for a chance to earn up to 105%.
Submit late up to 2021 Sep 27 for up to 75%.
Also see the PrairieLearn page for this assignment.
The natural world is full of hidden and beautiful mathematics. The whorls of a conch shell hide the Fibonacci sequence and its Golden Ratio, plants grow in fractal patterns, and comets trace hyperbolic patterns through the solar system. All those beautiful patterns hide in the grungy data of human observation. So, what are the populations of every town in Quebec and the number of posts by various authors to a hockey discussion forum hiding from you?
For this assignment, you will take a data source of integers and produce a frequency histogram of first digits of those numbers. Perhaps there will be some beautiful and surprising patterns in those first digits?
Briefly, here’s a summary of what you must design in your submission:
- One data file, described below.
- One program (a
.hs
or.lhs
file) which defines some functions we’ve given (likemain
) plus:getFirstDigit :: Integer -> Integer
to get the first digit of a numbergetFirstDigits :: [Integer] -> [Integer]
to get the first digits of a list of numbersaddLists
to “add” two lists ofInteger
s and produce a single list ofInteger
s (but it’s up to you to figure out the type)digitToCount
to convert a digit into a length-9 list representing a count of frequencies of the nine digits 1 to 9countDigits :: [Integer] -> [Integer]
to return a length-9 list representing the count of frequencies of the numbers 1 to 9 as the first digits of its argumentbarChart :: Char -> [Integer] -> [String]
to generate “bars” made up of the given letter of the lengths indicated in the given listextractData :: String -> (String, [Integer])
to pull apart a data file (as a string) like the one you are providing to us and return the text from its first line (as a title) along with a list of the numbers in the data filegenBenChart :: [Integer] -> [String]
to create a bar chart of first digit frequencies given a list of numbers
Here are some useful files for this project:
- Lib.hs
- A starting point file for you to work with. Essentially all the code below. Note: if you run the Haskell program
doctest
on the file, it will run the tests that start with>>>
scattered through the code below. And, of course, you can (should) write new ones! - Enrollment Data
- A modest-sized set of input you could use for testing. These are Ancient Enrollment Data from
the Dawn of TimeOK, the Dawn of the MilleniumBefore Most CPSC 312 Students Were BornShortly After Steve Graduated, Darn It! - Daily Covid Vaccinations
- A data set just large enough to hit our 250-data-points threshold of the number of daily doses of COVID vaccines administered in BC over time.
1 Your Data
Go find some data. It needs to be measurements (in consistent units) of something natural. “Natural” data could be the flow rate (in cubic meters per second or gallons per minute, as long as it’s consistent) of all the streams and rivers in BC, the file size of every file on your computer, or the daily number of vaccinations reported in BC. (Well, not that last because we’re providing you that one!) You must have the legal right to share the data with us! We won’t share it farther without asking you, however.
Your data must be a single text file of the following form:
The first line has a title for the data set (like “Enrollments” or “Number of Times Steve Goes Off Topic Per Section of the Assignment Description”). Your title MUST give attribution for the data, which means it will probably include a URL. (It’s fine if it’s really long.)
Each subsequent line starts with an integer followed, optionally, by a space and then any text. You can use the “space and text” to include comments on each line if you want.
Your chosen file must have at least 250 data points in it. It’s probably better if it doesn’t have more than a few tens of thousands of data points, just so we’re not getting too much data!
2 Finding the First Digit
At some point, you’ll have your hands on a single number from the data, and you’ll want to get just its frist digit. For that, you’ll design a function getFirstDigit :: Integer -> Integer
that takes an integer and produces only the first digit of that integer. It should work for any integer and return 0
only for the number 0
.
You should feel free to either solve the problem with a recursive function that repeatedly divides the number by 10 (for which guards may be useful) or one that uses the built-in function show
to transform the Integer
into a String
. Trying it out both ways would be great practice!
The latter solution is slick but tricky. If you take it on, bear in mind: 1. The type of show
may seem strange. For now, you can think of it as show :: Integer -> String
. 2. You’ll need to convert a Char
to a number. Look up fromEnum
and toEnum
and play around with them! 3. Sadly, the previous point will get you to an Int
, not the Integer
you need. Fortunately, you can convert from any Integral
type (like Int
) to any other numeric (Num
) type using fromIntegral :: (Integral a, Num b) => a -> b
. That types means it can convert from anything that fulfills the Integral
type class to anything that fulfills the Num
typeclass. Both Int
and Num
fulfill both type classes. For now, maybe think of fromIntegral
as a Int -> Integer
.
Finally, with either approach, be sure to add more good examples/test cases than we provide below! One set of inputs is very slightly trickier than the rest.
-- | @getFirstDigit n@ evaluates to the first digit of @n@, which is
-- in 0 for the number 0 and otherwise one of [1..9]. TODO: add the
-- critical missing type of example below and discuss it here.
--
-- >>> getFirstDigit 6
-- 6
--
-- >>> getFirstDigit 112
-- 1
getFirstDigit :: Integer -> Integer
= undefined -- TODO!!! getFirstDigit
3 Getting all the first digits
Now that we can find the first digit of one integer, we need the first digits of all the integers from a list, which we’ll get with getFirstDigits
.
The list type has two cases: an empty list []
and a list with a head element and the tail of the list (x:xs)
. Make a recursive implementation that also breaks down into these cases to solve the problem. You should absolutely be calling ‘getFirstDigit’!
We urge you to design this as a recursive function. Having done that, if you want to go back and use the map
function in Haskell (which is very similar to the map
discussed in CPSC 110), that’s a great idea!
-- | @getFirstDigits ns@ evaluates to a list of the first digits of
-- @ns@, using 'getFirstDigit', described above.
--
-- >>> getFirstDigits []
-- []
--
-- >>> getFirstDigits [15, 0, -4, 728]
-- [1,0,4,7]
getFirstDigits :: [Integer] -> [Integer]
= undefined -- TODO!!! getFirstDigits
4 Count the first digits
Now, we want to count up how many there are of each first digit.
One straightforward, if wordy, way we can solve this problem is using a list of length 9
. Each entry in the list will represent the count for a single digit (1
through 9
, ignoring 0
).
Design a function countDigits
that takes a list of integers and returns a count of how often the digits 1 through 9 appear in the list as a list of length 9.
You must use three helper functions:
addLists
adds two lists ofInteger
s and works a bit like:addLists [x1, x2, x3, ...] [y1, y2, y3, ...] = [x1 + y1, x2 + y2, x3 + y3, ...]
. (This won’t actually work as Haskell code!) Its result should be as long as the shorter of its two arguments.digitToCount
produces a length-9 list representing the count of a singleInteger
. So, for example,digitToCount 3 = [0, 0, 1, 0, 0, 0, 0, 0, 0]
. It should produce a list of 9 zeroes for any input besides 1 through 9.- Your own
getFirstDigits
orgetFirstDigit
(your choice) from above!
As usual, there are many approaches you can take here. For your learning, we recommend starting with a recursive implementation for addLists
. (Be careful: how many base cases are there?) If you’re excited about learning more, you could then try to reimplement it using zipWith
.
For digitToCount
the first time around, we recommend the wordy but simple solution of having 9 concrete cases to handle the numbers 1 through 9 and a single catch-all case for everything else. (With copy-and-paste, it’s quick to create.) You could then go back and rewrite it using replicate
(and the ++
list append operator).
For countDigits
itself, start with a recursive solution. Then, if you’d like to learn more, try going back to use map
and foldr
.1
Note that we haven’t given a starting point for addLists
or digitToCount
below. You’re in charge of creating the signature and definition for those on your own.
-- | @countDigits ns@ returns a list of 9 numbers representing how
-- often each digit [1..9] occurs as the first digit of a number in `ns`.
countDigits :: [Integer] -> [Integer]
= undefined -- TODO!!! countDigits
5 Generate a bar chart
At this point, we’ve planned out how to get from a list of numbers to a count of how often each digit 1 to 9 appears as the first digit. Now let’s display these as a chart. We’ll keep it simple and make a text bar chart.
In this section, design a function that takes a display character and a list of integers and generates a list of strings representing an attractive (?) bar chart with horizontal bars each of length of the corresponding integer, suitable for printing.
You’ll want some helper functions, but it’s up to you what they are this time!
-- | @barChart c ns@ produces a list of one string per number @n@ in
-- @ns@. Each string is as long as the corresponding number @n@ and
-- made up of copies of @c@. (The string is empty if @n < 0@.)
--
-- >>> barChart '#' []
-- []
--
-- >>> barChart '#' [3, 0, 5, 1]
-- ["###","","#####","#"]
--
-- >>> barChart '-' [2]
-- ["--"]
barChart :: Char -> [Integer] -> [String]
= undefined -- TODO!!! barChart
6 Put it all together
Now, put it all together into a function genBenChart
that takes a list of Integer
s and produces a bar chart with 9 bars, one each for the number of first digits that are 1s, 2s, 3s, …, 9s in the input list. genBenChart
should call many of your functions above.
(Why should this be called genBenChart
? Who’s Ben
? Let’s talk about that after the assignment due date.)
-- | @genBenChart ns@ produces a list of strings that is a bar
-- chart of '#' characters with 9 bars showing the frequency of
-- first digits 1 to 9 in @ns@.
--
-- >>> genBenChart [22, 7, 13, 18, 3, 6, 35]
-- ["##","#","##","","","#","#","",""]
genBenChart :: [Integer] -> [String]
= undefined -- TODO!!! genBenChart
(For your own experimentation, you may want to make a version of genBenChart
that scales the frequencies to integer percentages. So, it totals the 9 frequency counts and then produces a new list of frequency counts by multiplying each frequency by 100 and dividing it by that total. That displays better on a modestly wide screen! The div
function performs integer division2.)
7 Parse the input string
Design a function extractData
that takes an input string containing a title on the first line, followed by zero or more lines of data and evaluates to a tuple of the title and a list of the data as Integer
s.
The string passed as the argument should be in exactly the format described for your file in Your Data above. (Indeed, the main
program below simply reads in the file contents as a string and hands them off to your function for processing.) You may assume the input follows exactly this format. We’ll learn more about how to manage errors in the format another day!
You will certainly want helper functions, and you will likely want to use the built-in functions: lines
, words
, head
and tail
(although pattern-matching may be more convenient!), and read
. read
is a little strange. Try both read "12"
and read "12" :: Integer
at the GHCi prompt to get a sense of how strange. However, for our purposes, we can simply imagine it has the type String -> Integer
. Indeed, you could even define a new function:
readInteger :: String -> Integer
= read readInteger
Here’s a starting point for extractData
-- | @extractData s@ requires that @s@ be exactly: a single line (of
-- any text), followed by zero or more lines of data. A line of data
-- starts with an integer followed optionally by a space and any
-- additional text.
--
-- It evaluates to a tuple of the text of the first line and a list
-- of the integers from the subsequent lines.
--
-- >>> extractData "\n"
-- ("",[])
--
-- >>> extractData "Test\n12\n1 is a small number!\n345 45 5\n"
-- ("Test",[12,1,345])
--
-- >>> extractData "123\n-9\n"
-- ("123",[-9])
extractData :: String -> (String, [Integer])
= undefined -- TODO!!! extractData
8 The main
Program
You are not in charge of this section. Haskell handles input and output within a pure functional programming language by constructing a value representing a sort of computation plan that results in a particular type of value. An IO String
, for example, is a computation plan that may involve input and output (or other side effects) and ends up with a String
value.
An IO String
is not a String
and cannot be used like one. (That way the Haskell type system ensures you do not accidentally rely on input, output, or other side effects where you didn’t intend to!) However, Haskell provides principled ways to operate on values from within these IO x
types as if they were plain x
types (for whatever x
you’re working on) while still ensuring that the result stays inside IO
.
Finally, the ghc
runtime itself actually executes that input/output plan in order to act in the world when it runs your main
function.
The >>=
operator assembles computation plans. Specifically, it takes an IO a
on the left and an a -> IO b
function on the right. The first is a computation plan (that, as usual for IO
, may involve side effects) that produces an a
. The second is a function that takes an a
and produces a computation plan that produces a b
. It assembles them into a single computation plan that does all the work described by the IO a
and the IO b
produced when the function operates on the a
. In other words, >>=
lets you assemble complex computation plans from individual pieces, where each piece can rely on the results of previous pieces. (>>=
is actually more general than that, but this is a decent description for how it works in many cases.)
pure
takes a plain old (pure) value and turns it into one of these “values in a computation”. So, e.g., pure 3
is a trivial computation plan that doesn’t do anything but produce 3
. We use it just so we can intersperse regular stuff inside the >>=
stuff. The .
operator chains two functions together so that (f . g) x
is exactly f (g x)
. (If you’ve taken CPSC 121, it’s the “function composition” you hopefully talked about there, if briefly!)
So, all in all, the code below assembles a computation plan that:
- gets all the contents of stdin as a string,
- feeds that string to
extractData
, - feeds the resulting tuple to a little anonymous function that
- generates the bar chart (with
genBenChart
), - prepends the title, and
- puts all those strings together using
"\n"
(withunlines
),
- generates the bar chart (with
- and then prints the result.
(There are some fun ways we could rewrite this function, but it’s also fine just as it is.)
-- | @main@ reads from standard input (and expects data in
-- the format described for 'extractData' above), and prints
-- a bar chart showing frequencies of first digits in the data
-- to standard output.
main :: IO ()
=
main getContents >>=
pure . extractData >>=
pure . (\(title,numLines) ->
unlines (title : genBenChart numLines)) >>=
putStr
9 Bonus Point Opportunities
As always, you can earn Bonus Points for contributing to learning in the course! We’ll award bonus points for great contributions when we see them.
We’re also offering two optional opportunities for bonus points on this assignment:
- Submit a separate, complete version of
genBenChart
in 250 characters or less. There’s a separate, optional problem for you to submit it to in PrairieLearn. Our version is significantly shorter than 250 characters but still quite readable. It takes advantage of things likereplicate
,read
andshow
,take
,flip
,filter
,map
, andwhere
clauses. (We usewhere
clauses to give meaningful names to our intermediate results, even though it wastes characters!) - We’ll award a bonus point to submitted data sources we think are especially cool. In that case, we’ll probably also ask you if we can share yours with the class as a whole. Feel free to use that first line of your data source to document what makes yours cool. (Bear in mind that we must know the source of the data and that it’s legal for you to share if we’re going to be granting bonus points!)
In fact, if you really wanted to go wild in Haskell, you could read about
Monoid
andnewtype
(as in its use withSum
andProduct
), and then create an alternate listMonoid
such that a list of anyMonoid
is itself aMonoid
, at which point you could usemconcat
to replacefoldr
above. That’s deep and cool but likely way beyond where we’ll get to this term!↩︎This used to (incorrectly!) say addition. Thanks for flagging that to us!↩︎