Iterative Dichotomiser (January Haskell Tests 2017)

I probably can’t really explain this much better than the source:

The January tests are on-line Haskell programming tests sat under examination conditions by first-year undergraduate students at  Imperial College at the end of a six-week introductory programming course.

Dr. Tony Field has been publishing them for quite a few years now (the linked page goes back to 2009). I still to some extent remember my first year Haskell courses, somewhat impressed by the rationale for choosing Haskell even though my understanding at the time was rather clouded. I do remember a specific instance where Tony polled the class on past programming experience, noting hands for C++ and Java (I raised my hand for both), and then tossing in Haskell (a few people; probably because Imperial did say to prepare to study Haskell beforehand!). Besides this, I think  having to worry less about persistent state (or race conditions, though I don’t think we covered concurrency at Imperial until second year) and being closer to the notion of mathematical functions, which students should already have been exposed to, would also have helped.

This year’s test (2017) covered decision trees, culminating in a question inviting candidates to implement the information gain version of ID3 when building a decision tree from a dataset. It wasn’t too difficult as a whole, as Tony acknowledged on his page; despite probably last having written Haskell about a year ago when I attempted the 2016 test, I finished comfortably in 2 hours and 12 minutes (the time limit is 3 hours). I think this test as a whole required some fairly careful work, but didn’t demand anything in terms of special insight even at the very end (as some past tests have done). The elegance of my answers would probably leave a fair bit to be desired, though; I found I was building recursive traversals of the given datatypes very frequently.

That said, I found the first part somewhat more difficult than in the past. Typically Part I was extremely straightforward (rather awkwardly, there used to be a question asking students to implement lookUp almost every year); not so much this time. In particular, there was a rather fiddly function to write that involved navigating some data structures and building a frequency table; the spec featured a lot of type definitions that reminded me a bit of some experiments with Object Calisthenics (in particular, the “wrap primitives and strings in classes” rule). That said, completing Part I alone would already have let you pass (with a 47; the pass mark is 40). I think the frequency table was harder than anything in Part II, actually, which had a few, fairly straightforward tree functions to write.

Moving on, part III effectively introduced students to the Strategy pattern (in terms of an attribute selector for deciding which attribute to split the dataset on). Apparently, it was possible to solve partitionData with a suitable one-liner, though I unfortunately didn’t come up with anything along those lines, and went with a rather “direct” approach of dividing the rows by the element and removing the relevant attributes. Part IV introduced the concepts of entropy and information gain, and thus invited students to implement ID3; given the title of the paper I wasn’t too surprised to see this at the end.

I found it fairly interesting to revisit Haskell, and it was nice to see that I was still able to work with the language after not touching it for a year or so. While it might be fairly unlikely that I would work with functional languages in professional terms, concepts and reasoning that are more apparent in functional languages do certainly apply even when I work in Java or C++, whether in the obvious sense (streams/map/filter etc.) or less so (inductive/algebraic datatypes).


Leave a Reply