Browse Category

Imperial College

Static Single Assignments (January Haskell Tests 2018 – Part 2)

Second in a two-part series; see Constant Propagation (January Haskell Tests 2018 Part 1) for the first part, which covered the assessed portion of the test.

In January this year, Imperial College’s first-year computing students were tasked with implementing a tool to handle constant propagation for statements in static single assignment (SSA) form. In particular, this means that each variable is only assigned to once, and there are no compound expressions. This is useful for optimising compilers to simplify the machine code that a program compiles down to.

Of course, being required to write everything in SSA form is painful; as much as I like immutable objects, mandating their use can be costly (consider a large set, for instance, that periodically receives updates). Banning compound statements can also be troublesome; hence, SSA compilation is typically automated.

The test concluded with a challenge tasking students to implement support for converting a more general program into SSA form. The students had been working with a simple imperative language with statements to assign values to integer variables (possibly doing mathematical operations), conditional statements (if-then-else) and do-while loops.

This problem naturally decomposes into three sub-tasks: supporting SSA translation for basic blocks, conditionals and then loops. The first problem can also readily be split into two sub-problems: ensuring variables are immutable and then handling compound expressions.

Basic Blocks

Immutable Variables

I opted to begin with immutability here, as this would allow for easy testing. This is done by creating multiple versions of a variable. For example,

a = 10;
b = a + 2;
b = a + b + 8;
c = a + b;
a = c + 3;

could become

a1 = 10;
b1 = a1 + 2;
b2 = a1 + b1 + 8;
c1 = a1 + b2;
a2 = c1 + 3;

Essentially, one could implement this by keeping track of the latest version of each variable used, and incrementing the version whenever processing an assignment. Whenever handling the right-hand side of an assignment, one needs to be able to know how to reference the latest version of a variable.

-- Usage Map for handling new instances of variables
type UsageMap = [(Id, Int)]

-- Retrieves a fresh identifier for the given raw variable.
getIdentifier :: Id -> UsageMap -> (Id, Int, UsageMap)
getIdentifier "$return" map
  = ("$return", 0, map)
getIdentifier id []
  = (id ++ "0", 0, [(id, 1)])
getIdentifier id (binding:bindings)
  | id == bindingId 
    = (id ++ show(bindingCount), bindingCount, (id, bindingCount + 1):bindings)
  | otherwise       
    = (returnedId, returnedCount, binding:returnedBindings)
  where
    (bindingId, bindingCount) = binding
    (returnedId, returnedCount, returnedBindings) = getIdentifier id bindings

I’m pretty sure I wouldn’t have bothered to write a specific UsageMap type when I was still in university, though working in industry seems to have made me think more carefully about typing.

Compound Statements

The version of SSA in the paper also doesn’t allow compound statements (e.g. a = ((b * c) + 5) / d). However, this could be handled reasonably as arithmetic expressions can be represented in tree form.

We can create temporary variables for each non-leaf node of the expression tree. For example, we define k1 = b * c, k2 = k1 + 5 and k3 = k2 / d. Some care is needed in cases where the tree is a bit more balanced (we need to know that the first argument of our plus above is k1).

Also, of course, k is only used for illustration above, as real functions could have a value actually called k! It makes sense to have a counter separate from the rest of the UsageMap, though I relied on some limits on the variable names that were mentioned in the specification. I’m not sure if I would have worked this out in first year!

Conditional Expressions

One needs to consider which versions of a variable are in scope. Let’s consider the following block:

a = 7
b = a * 3
if (b > 20) {
  a = b + 3
  b = 0
} else {
  a = b - 3
  b = 1
}
return a + b

After reaching the end of the if block, we can’t just use the UsageMap we have obtained from that to pass into the else block (since on an execution where we’re going in to the else block, the if block must never have run). Similarly, after reaching the end of else we can’t just use the UsageMap from that (we don’t know if the else block has run).

It is probably easier to visualise this block of code as a tree. Each node represents a possible linear execution path that we may run.

The versions of variables each of the blocks read upon branching should be the version before we entered that block. Thus, we should reference the state of the UsageMap from before we entered the if block when entering the else block. We actually want something like this.

When returning, we somehow need to pick one of the final values for variables that were possibly modified. This is difficult to express, but we can approximate it with a phi function, where \phi(a, b) nondeterministically returns either a or b. We can thus simply pick a4 = \phi(a2, a3) and b4 = \phi(b2, b3), and add these statements. Finally, we would want to return a4 + b4. Some care needs to be taken with the implementation (for example, one must handle the case where a variable is only possibly modified correctly). This should, for the above block of code, yield:

a1 = 7 
b1 = a1 * 3 
if (b1 > 20) { 
  a2 = b1 + 3
  b2 = 0
} else { 
  a3 = b1 - 3
  b3 = 1 
} 
a4 = PHI(a2, a3)
b4 = PHI(b2, b3)
return a4 + b4

During the implementation I was running out of time, which unfortunately led to this monster of a helper function. It could probably be cleaned up by perhaps making a type for ‘return values of a traversal’. It’s also probably reasonable to do some kind of higher-order function on the mod-set of variables (which I computed elsewhere).

-- Compute values to be used for PHI for one variable at a time
plantIfPhi :: UsageMap -> UsageMap -> UsageMap -> UsageMap -> [Id] -> (Block, UsageMap)
plantIfPhi original afterIf afterElse final []
  = ([], final)
plantIfPhi original afterIf afterElse accumulator (id:ids)
  = ([Assign newId (Phi (Var varAfterIf) (Var varAfterElse))] ++ otherPhis,
     final)
  where
    oldVersion = lookupWithFailure id original
    versionAfterIf = lookupWithFailure id afterIf
    versionAfterElse = lookupWithFailure id afterElse
    realVersionAfterElse 
      = if versionAfterIf == versionAfterElse
        then oldVersion
        else versionAfterElse
    varAfterIf = id ++ show(versionAfterIf)
    varAfterElse = id ++ show(realVersionAfterElse)
    (newId, _, newMap) = getIdentifier id accumulator
    (otherPhis, final) 
      = plantIfPhi original afterIf afterElse newMap ids

Do-While Loops

Finally we also need to support loops. We could use a call graph structure, similar to what we had for the if-then-else conditional, except that there is a cycle.

a = a + 2
b = 0
do {
  a = a + 1
  b = 5
} while (a < 10)

At first glance, this looks tricky. We don’t know the number of times the loop will execute, though we do know it’s at least once.

Which value of a are we reading in the loop body? Well, this is either the value from the initial set-up, or the value the previous loop left at its end. We could thus try to introduce phi-functions:

a2 = a1 + 2
b1 = 0
do {
  a3 = PHI(a2, a4)
  b2 = PHI(b1, b3)
  a4 = a3 + 1
  b3 = 5
} while (a4 < 10)

This program is impossible to execute, of course; the definitions are cyclic. However, we’re not actually trying to execute it. We’re only using this as an intermediate representation to try to statically optimise our code. Of course, if the two arguments to a phi-function are the same, we can eliminate it, mark that value as a constant if known, and then use the information about it being a constant to drive further optimisations.

Synthesis

I think this could be a fairly challenging question in and of itself for first-year computing students, though some degree of hand-holding would likely be required (in particular, the tree traversal for splitting out intermediate assignments can be quite fiddly to work with). It would also be worth introducing the concept of basic blocks early on, as reasoning about them is more useful in handling conditions and loops. It was a rather solid and satisfying conclusion to the original 2017 test.

Constant Propagation (January Haskell Tests 2018 – Part 1)

The latest iteration of the January Haskell Tests is out (see Dr Tony Field’s page, and my post on the 2017 test). I think the last time I wrote any Haskell was eleven months ago when taking the 2017 test for a spin; nonetheless I was able to pick things up quickly. I guess the syntax (outside of monads) is pretty intuitive.

This test in some ways felt a fair bit easier than the last, at least in terms of the assessed portion (the last time round, it took me up to just past the two-hour mark to clear that; this time round, I was done with the questions that were actually worth marks by 1:15). However, the ‘bonus’ unassessed Part 4 seemed hard; finishing it in time looked to be a pretty tall order. Part 4 itself could be decomposed into three fairly chunky parts, and could even be a reasonable basis for a separate test!

The test dealt with optimising code that was already in static single assignment (SSA) form. I’d covered this topic before in a course on software reliability (within the context of static program verification), though I don’t remember covering it within the context of (possibly optimising) compilers. SSA form is fairly restrictive:

  • Each variable can only be assigned to once.
  • Operator arguments can only comprise constants and direct variable references.

For example, the assignment  x = 4 * y + 7 * z is clearly not in SSA form; but it can be rewritten as  a = 4*y; b = 7*z; x = a + b where each statement is. In most programming languages variables can be assigned to multiple times; as part of conversion, we tend to add suffixes to variables for versioning. A more involved example follows:

long notSsa(x, y) {
  a = x + 100;
  b = 2 * y + 3; // compound expression
  b = b + x; // can't reassign!
  return a + b;
}

long ssaVersion(x, y) {
  a0 = x + 100;
  k0 = 2 * y;
  b0 = k0 + 3;
  b1 = b0 + x;
  return a0 + b1;
}

SSA also allows for the concept of a phi function, which has two expressions, and non-deterministically returns one or the other. This is useful for performing optimisations regarding control flow. For example, consider this function:

long veracity(x):
  z = 1;
  y = 5;
  t = 0;
  if (x > 0) {
    z = 2;
    t = z + y;
  } else {
    z = 3;
    t = (2 * y) - z;
  }
  return x + t;
}

We can rewrite this using a phi function; after an if-block, the final value of a variable could be the value at the end of either branch.

long veracity(x):
  z0 = 1;
  y0 = 5;
  t0 = 0;
  if (x0 > 0) {
    z1 = 2;
    t1 = z1 + y0;
  } else {
    z2 = 3;
    k0 = 2 * y1;
    t2 = k0 - z2;
  }
  z3 = PHI(z1, z2);
  t3 = PHI(t1, t2);
  return x0 + t3;
}

Although one had to work with phi functions as early as Part 2 (the function written at the end of part 2 should be able to simplify the above function significantly), thinking about the purpose of phi functions certainly wasn’t important until Part 4.

Part 1 involved writing an interpreter to execute statements in a (very) simple subset of C, where code has already been parsed into tokens. I think this was intended to set a baseline for students who were struggling with the material; I knocked through the 14 marks worth (which is enough for a pass; that’s 47%) in just 17 minutes.

Things got somewhat tougher in part 2. The opening was easy, with two 3-mark questions inviting candidates to implement simple constant folding optimisations (in particular, that  0 + x = x and \phi(x, x) = x , along with calculating out results) and then determining the value of an expression given knowledge of some variable values.

The tricky part then began, with a task to implement constant propagation and folding to actual code blocks in SSA form. Conceptually, this is not difficult – once the value of a variable is known, it can be substituted into all subsequent occurrences. A fix-point approach fits naturally, since knowing something about some variables can lead to knowledge about more variables. Nonetheless, I struggled for a while with implementing the suggested worklist algorithm; the aforementioned fix-point approach was suggested too, and would probably have been more elegant, if harder to implement. I finished this part at 1:01.

Part 3 involved removing phi functions if it was not possible to clean them up as part of the SSA optimisation, by pushing the variable assignments back into code. It seems a decision was made to state the rules for these explicitly in the specification, for some reason (to be fair, as Tony notes it is straightforward), which seemed to make for a pretty tame last assessed part. There was some tricky thinking required here, nonetheless; I think one has to be very careful with handling multiple phis!

Finally, Part 4 tasked students with translating a function to SSA form. This itself is weighty enough that I think another test could easily be written on it. This part can be decomposed into three (maybe four) separate challenges:

  1. Translate basic blocks to SSA form correctly.
    1. Maintain correct information about versioning of variables.
    2. Decompose compound statements (like x = 3 * y + 5 * z).
  2. Add support for conditionals.
  3. Add support for loops (do-while was a part of the simple C subset used).

Squeezing all of that into the remainder of the time after finishing the first three parts would have been tough; I finished the first two full tasks (basic blocks at 2:07 and conditionals at 2:46) and squeezed out a comment on what to do for the last sub-task. The last part of task 2 was also written rather hackily as I was racing the clock. For a quick preview:

lookupWithFailure :: Id -> UsageMap -> Int
lookupWithFailure id map
  = fromMaybe (0) (lookup id map) - 1 -- TODO hack

To be fair, I could probably have churned all three main and three bonus parts out in Java within the time-limit; it’s nonetheless useful to engage a different set of programming skills.

Last year, I wrote that I enjoyed revisiting Haskell; I enjoyed this year’s task too. I think I actually enjoyed this installment more, probably because of the outsize difficulty in Part 4! Although I haven’t written professional Haskell, I often find myself writing functional-style code in Java, and I’d still give it an enthusiastic thumbs-up for a choice of first language to teach at university. There are some influences from professional development coming in here, too; a year ago I probably wouldn’t have batted an eye at using types like a Haskell [(Str, [Int])] or Java Map<String, List<Integer>>, while I’ll tend to give structures types more quickly. My tolerance for long functions (without good reason, in non performance-critical scenarios) has also certainly decreased.

This post is Part 1… I’ll take a more in depth look at Part 4 of the exam in Part 2.

Should Exams Be Graded Out Of 137?

I reviewed Misbehaving a couple of weeks ago, and in the introduction the author (Richard Thaler) relates an interesting tale about grading exams. He mentions that setting the maximum score to be 137 instead of 100 in a US college context was useful, in that it allowed him to gain a better understanding of students’ performance without drawing too much complaint [1].

Thaler finds that exams can be made a lot more difficult, without hurting students’ confidence too much. The actual percentages students obtain are immaterial (because how raw scores translate to actual grades is done on a curve); yet, in a US college context where A grades often start at 90% or above, suddenly getting a 72 (even if it is an A+) could be unnerving. Making exams difficult allows students to show that they haven’t just learned the material but have understood the logic and thinking behind it. For a first year logic course, say, I wouldn’t be opposed to the exam including unseen extensions of the material covered in class (for example, branching into monadic second-order logic or simple temporal logics); for a final year course, I’d expect students to be able to understand papers on the subject (even if unseen) and apply knowledge from said papers. This may be related to my work and interests, but the courses I remember best at Imperial include disproportionately many with difficult exams (Software Engineering (Design), Complexity and Intelligent Data and Probabilistic Inference come to mind).

There is a risk of stressing students out if they do the math, of course. A few quick presses on a keyboard or calculator should reveal that, say, a 96/137 is just about 70 percent. However, one has to be fairly deliberate about doing such calculations, as 137 is a rather awkward number to mentally calculate percentages with. If you asked me to estimate what a mark out of 137 corresponded to in a 100-point scale, I would probably engage in some kind of iterative search algorithm where I added 13.7s, and then 1.4s. However, apparently people are generally lazy enough that they won’t bother with computing the percentage of their raw score. They’d simply look at the grade itself (which largely depends on where the raw score stands in relation to others, not so much on its absolute value).

Would this system be applicable or useful in the UK? I’ll acknowledge it was indeed a very clever solution in a US context, though I’m not sure it’s necessarily as useful in the UK. This mainly stems from the fact that grades in the UK are generally numerically lower; an A (a First) typically starts at 70 percent rather than 90. There’s thus already quite a fair bit of room to separate students that have merely learned the material from those that have engaged and understood it. Thaler mentions that students scored an average of 72 out of 100 on his initial exam (which caused an uproar among his students); that’s on the high side in the UK! Typically, exams at Imperial would have a fairly difficult ‘tail end’ for the last 30 percent; I could see a case for a really hard ‘last stretch’ (10 percent) which tended to occur in some of the programming tests, but I think most exams have had enough discriminatory power. Generally, you’d want to use research projects and other forms of work over longer periods of time to sift out the top students (as those are probably more similar to both industry and postgraduate academia).

Another difference is that UK universities typically aggregate course grades by a percentage system, rather than by GPA. This does make sense; I would generally consider a student scoring 90 and 65 in two modules (77.5 average) to be far superior to one scoring a pair of 72s, while the GPA system would say otherwise. I’d be inclined to think I’d prefer the 90/65 student to one with a pair of 78s, even. This is not a problem for using a raw limit of 137, as they can always be translated (non-linearly, if need be) to percentage grades.

One benefit of this system is psychological; even if the raw score might well be just 60 percent, scoring an 82 sounds much better. However, at Imperial at least, exam scripts generally aren’t returned to students. Also, this relies on most exams being graded out of 100 points (or less); if one is accustomed to 200-point exams, then being confronted with a score in the 80s might feel terrible (even though it is percentage-wise better than a 110 in a 200-point exam!)

There is also one minor objection I have, speaking more generally. Unless the exam being set is really, really hard, it’s likely that some students are likely to score above 100. Thaler claims this “(generated) a reaction approaching ecstasy” – I’m not sure I’d feel the same way as it would once again become front and center that the raw score is not a percentage, unlike most exams! The illusion would be clearly broken. Nonetheless, these students can probably take comfort in the fact (if they need to) that they’re probably well ahead of the rest of the class.

In general, I think grading out of 137 (in raw scoring terms) can be pretty useful if (1) existing mandated grade boundaries or expectations are too high, so that one can’t distinguish students’ command of material; (2) one can control how raw scores are actually translated into grades that the students receive; (3) students get to view the raw scores, and (4) most exams are graded out of 100 (or possibly less).

All four appear to be true in the US system. While (2), (4) and maybe (1) are true at Imperial at least, (3) was not – and I’d say without (3) it probably wouldn’t be worth the effort.

On Teaching and Teaching Assistants

I read an article in the Guardian a few days ago on the Finnish education system which is often cited in the western world as one example of a successful education system (in spite of some rather angry bold negative deltas in PISA 2015 – however reliable said tests are as a form of assessment). I’ve been through the table-topping Singapore education system, and while it certainly is rigorous (especially in mathematics – I recently looked at an A level Thinking Skills problem-solving question that, chillingly, wouldn’t be too out of place on the Mathematics PSLE in Singapore) there are valid concerns regarding excessive stress levels, teaching not being perceived as a high-profile job and a lack of time for students to explore things on their own. I would certainly understand a desire not to bring some of these elements into an education system.

The headline message being trust your teachers is something I can appreciate to some extent, even though I was never explicitly a teacher, at least in terms of profession. I had the privilege of being an undergraduate teaching assistant during my third and fourth years at Imperial, and I like to think that the professors and lecturers who were supervising me placed a lot of trust in me; they certainly gave me a fair bit of latitude in the content I could cover (perhaps not the “unfettered flexibility” mentioned in the article, but I was supposed to teach rather specific modules – Logic, Discrete Mathematics, Reasoning about Programs, and Data Structures and Algorithms).

I was given high ability groups in both years, and this resulted in advanced tutorials that introduced students to various slightly more advanced topics they would see soon (concurrency, temporal logics, optimisation algorithms), as well as stretched their abilities in applying the concepts and knowledge learnt (okay, you know what a loop invariant is – how can we extend it to nested loops? Functions containing loops that could be recursive?). I believe these were appreciated, and did collect feedback on them (though, of course, it’s difficult to be sure how much negative feedback was swept under the rug with these kinds of questionnaires).

Unfortunately, I did also indulge in some “teaching to the test”, explaining strategies for tackling the various exams that were certainly not a part of the syllabus. Thankfully, Imperial’s examinations don’t have too much exploitability here, as far as I can recall; I think much effort was spent identifying common pitfalls and explaining how to avoid them (e.g. unnecessary quantifier movement in Logic, and telling students to clearly demonstrate their thought processes even if they couldn’t answer the question). Some of this was certainly backed by popular demand, and it did pay off in that my students did win multiple prizes. I certainly wasn’t in a position to change the assessment system at Imperial!

I did encounter minimal bureaucracy, mainly around marking the attendance of students (some of this is part of the Home Office’s “expected contacts” requirement for non-EU students). I can’t remember if a DBS check was necessary, though I already had one from year 1, in any case. Thankfully, there was nothing along the scale of what was being described in the article:

Contrast this with the UK, where schools have data managers, where some teachers are told which colour pens to use for marking, and where books are periodically checked to ensure that learning intentions are neatly stuck in place.

Not necessarily sure that the existence of data managers is a bad thing (after all, I do work for a company that helps others make data-driven decisions!) – but that said, drawing conclusions from data that doesn’t truly reflect students’ abilities (if that is what is going on) is very unlikely to be effective (“garbage in, garbage out” springs to mind).

I did do a stint as a volunteer student helper with a few schools near Imperial as part of the Pimlico Connection programme. Although I didn’t witness said book checks, I certainly did notice teachers explicitly reference curricular objectives and the level system (this was pre-September 2014 changes). Obviously, I’m not sure how representative this is in schools in general, though. I think the only time I recall encountering this in Singapore was when preparing for the Computer Science IB HL exams.

The article concludes with an expansion of this notion of trusting individual teachers to societal trends towards trust in general, though not much evidence or data is presented on this. I guess some connections can be drawn to a point raised earlier on relative economic homogeneity. Looking at an issue of trust in the UK in specific, there is interestingly a series of studies that attempt to collect data on this. Slide 19 on the linked page suggests that 55% of British people trust the British people to “do the right thing”, whatever that entails.

In terms of trusting individual teachers, I’d probably be comfortable with that only if there was a good process for selecting teachers. That’s a difficult problem – simply going for the “best and brightest” in terms of students’ academic results certainly isn’t enough, as the Finnish process acknowledges. We did do that at Imperial, though in some sense the stakes are lower there as there is a supervisor monitoring the process and it is, still, one of the “least worst” indicators one can use. However, I think once one acquires confidence and skill such that one will not struggle with the concepts one is teaching, and one can answer students’ questions comfortably (within reason), there are many other more important traits. My knowledge of tricky automata theory or familiarity with theoretical complexity classes, or for that matter ability to knock in a 96% in the Logic and Reasoning about Programs exams (as opposed to say an 85 or 90) were generally not directly relevant to doing my job as a teaching assistant!

Iterative Dichotomiser (January Haskell Tests 2017)

I probably can’t really explain this much better than the source:

The January tests are on-line Haskell programming tests sat under examination conditions by first-year undergraduate students at  Imperial College at the end of a six-week introductory programming course.

Dr. Tony Field has been publishing them for quite a few years now (the linked page goes back to 2009). I still to some extent remember my first year Haskell courses, somewhat impressed by the rationale for choosing Haskell even though my understanding at the time was rather clouded. I do remember a specific instance where Tony polled the class on past programming experience, noting hands for C++ and Java (I raised my hand for both), and then tossing in Haskell (a few people; probably because Imperial did say to prepare to study Haskell beforehand!). Besides this, I think  having to worry less about persistent state (or race conditions, though I don’t think we covered concurrency at Imperial until second year) and being closer to the notion of mathematical functions, which students should already have been exposed to, would also have helped.

This year’s test (2017) covered decision trees, culminating in a question inviting candidates to implement the information gain version of ID3 when building a decision tree from a dataset. It wasn’t too difficult as a whole, as Tony acknowledged on his page; despite probably last having written Haskell about a year ago when I attempted the 2016 test, I finished comfortably in 2 hours and 12 minutes (the time limit is 3 hours). I think this test as a whole required some fairly careful work, but didn’t demand anything in terms of special insight even at the very end (as some past tests have done). The elegance of my answers would probably leave a fair bit to be desired, though; I found I was building recursive traversals of the given datatypes very frequently.

That said, I found the first part somewhat more difficult than in the past. Typically Part I was extremely straightforward (rather awkwardly, there used to be a question asking students to implement lookUp almost every year); not so much this time. In particular, there was a rather fiddly function to write that involved navigating some data structures and building a frequency table; the spec featured a lot of type definitions that reminded me a bit of some experiments with Object Calisthenics (in particular, the “wrap primitives and strings in classes” rule). That said, completing Part I alone would already have let you pass (with a 47; the pass mark is 40). I think the frequency table was harder than anything in Part II, actually, which had a few, fairly straightforward tree functions to write.

Moving on, part III effectively introduced students to the Strategy pattern (in terms of an attribute selector for deciding which attribute to split the dataset on). Apparently, it was possible to solve partitionData with a suitable one-liner, though I unfortunately didn’t come up with anything along those lines, and went with a rather “direct” approach of dividing the rows by the element and removing the relevant attributes. Part IV introduced the concepts of entropy and information gain, and thus invited students to implement ID3; given the title of the paper I wasn’t too surprised to see this at the end.

I found it fairly interesting to revisit Haskell, and it was nice to see that I was still able to work with the language after not touching it for a year or so. While it might be fairly unlikely that I would work with functional languages in professional terms, concepts and reasoning that are more apparent in functional languages do certainly apply even when I work in Java or C++, whether in the obvious sense (streams/map/filter etc.) or less so (inductive/algebraic datatypes).