Browse Month

February 2018

Static Single Assignments (January Haskell Tests 2018 – Part 2)

Second in a two-part series; see Constant Propagation (January Haskell Tests 2018 Part 1) for the first part, which covered the assessed portion of the test.

In January this year, Imperial College’s first-year computing students were tasked with implementing a tool to handle constant propagation for statements in static single assignment (SSA) form. In particular, this means that each variable is only assigned to once, and there are no compound expressions. This is useful for optimising compilers to simplify the machine code that a program compiles down to.

Of course, being required to write everything in SSA form is painful; as much as I like immutable objects, mandating their use can be costly (consider a large set, for instance, that periodically receives updates). Banning compound statements can also be troublesome; hence, SSA compilation is typically automated.

The test concluded with a challenge tasking students to implement support for converting a more general program into SSA form. The students had been working with a simple imperative language with statements to assign values to integer variables (possibly doing mathematical operations), conditional statements (if-then-else) and do-while loops.

This problem naturally decomposes into three sub-tasks: supporting SSA translation for basic blocks, conditionals and then loops. The first problem can also readily be split into two sub-problems: ensuring variables are immutable and then handling compound expressions.

Basic Blocks

Immutable Variables

I opted to begin with immutability here, as this would allow for easy testing. This is done by creating multiple versions of a variable. For example,

could become

Essentially, one could implement this by keeping track of the latest version of each variable used, and incrementing the version whenever processing an assignment. Whenever handling the right-hand side of an assignment, one needs to be able to know how to reference the latest version of a variable.

I’m pretty sure I wouldn’t have bothered to write a specific UsageMap type when I was still in university, though working in industry seems to have made me think more carefully about typing.

Compound Statements

The version of SSA in the paper also doesn’t allow compound statements (e.g. a = ((b * c) + 5) / d). However, this could be handled reasonably as arithmetic expressions can be represented in tree form.

We can create temporary variables for each non-leaf node of the expression tree. For example, we define k1 = b * c, k2 = k1 + 5 and k3 = k2 / d. Some care is needed in cases where the tree is a bit more balanced (we need to know that the first argument of our plus above is k1).

Also, of course, k is only used for illustration above, as real functions could have a value actually called k! It makes sense to have a counter separate from the rest of the UsageMap, though I relied on some limits on the variable names that were mentioned in the specification. I’m not sure if I would have worked this out in first year!

Conditional Expressions

One needs to consider which versions of a variable are in scope. Let’s consider the following block:

After reaching the end of the if block, we can’t just use the UsageMap we have obtained from that to pass into the else block (since on an execution where we’re going in to the else block, the if block must never have run). Similarly, after reaching the end of else we can’t just use the UsageMap from that (we don’t know if the else block has run).

It is probably easier to visualise this block of code as a tree. Each node represents a possible linear execution path that we may run.

The versions of variables each of the blocks read upon branching should be the version before we entered that block. Thus, we should reference the state of the UsageMap from before we entered the if block when entering the else block. We actually want something like this.

When returning, we somehow need to pick one of the final values for variables that were possibly modified. This is difficult to express, but we can approximate it with a phi function, where \phi(a, b) nondeterministically returns either a or b. We can thus simply pick a4 = \phi(a2, a3) and b4 = \phi(b2, b3), and add these statements. Finally, we would want to return a4 + b4. Some care needs to be taken with the implementation (for example, one must handle the case where a variable is only possibly modified correctly). This should, for the above block of code, yield:

During the implementation I was running out of time, which unfortunately led to this monster of a helper function. It could probably be cleaned up by perhaps making a type for ‘return values of a traversal’. It’s also probably reasonable to do some kind of higher-order function on the mod-set of variables (which I computed elsewhere).

Do-While Loops

Finally we also need to support loops. We could use a call graph structure, similar to what we had for the if-then-else conditional, except that there is a cycle.

At first glance, this looks tricky. We don’t know the number of times the loop will execute, though we do know it’s at least once.

Which value of a are we reading in the loop body? Well, this is either the value from the initial set-up, or the value the previous loop left at its end. We could thus try to introduce phi-functions:

This program is impossible to execute, of course; the definitions are cyclic. However, we’re not actually trying to execute it. We’re only using this as an intermediate representation to try to statically optimise our code. Of course, if the two arguments to a phi-function are the same, we can eliminate it, mark that value as a constant if known, and then use the information about it being a constant to drive further optimisations.

Synthesis

I think this could be a fairly challenging question in and of itself for first-year computing students, though some degree of hand-holding would likely be required (in particular, the tree traversal for splitting out intermediate assignments can be quite fiddly to work with). It would also be worth introducing the concept of basic blocks early on, as reasoning about them is more useful in handling conditions and loops. It was a rather solid and satisfying conclusion to the original 2017 test.

Constant Propagation (January Haskell Tests 2018 – Part 1)

The latest iteration of the January Haskell Tests is out (see Dr Tony Field’s page, and my post on the 2017 test). I think the last time I wrote any Haskell was eleven months ago when taking the 2017 test for a spin; nonetheless I was able to pick things up quickly. I guess the syntax (outside of monads) is pretty intuitive.

This test in some ways felt a fair bit easier than the last, at least in terms of the assessed portion (the last time round, it took me up to just past the two-hour mark to clear that; this time round, I was done with the questions that were actually worth marks by 1:15). However, the ‘bonus’ unassessed Part 4 seemed hard; finishing it in time looked to be a pretty tall order. Part 4 itself could be decomposed into three fairly chunky parts, and could even be a reasonable basis for a separate test!

The test dealt with optimising code that was already in static single assignment (SSA) form. I’d covered this topic before in a course on software reliability (within the context of static program verification), though I don’t remember covering it within the context of (possibly optimising) compilers. SSA form is fairly restrictive:

  • Each variable can only be assigned to once.
  • Operator arguments can only comprise constants and direct variable references.

For example, the assignment  x = 4 * y + 7 * z is clearly not in SSA form; but it can be rewritten as  a = 4*y; b = 7*z; x = a + b where each statement is. In most programming languages variables can be assigned to multiple times; as part of conversion, we tend to add suffixes to variables for versioning. A more involved example follows:

SSA also allows for the concept of a phi function, which has two expressions, and non-deterministically returns one or the other. This is useful for performing optimisations regarding control flow. For example, consider this function:

We can rewrite this using a phi function; after an if-block, the final value of a variable could be the value at the end of either branch.

Although one had to work with phi functions as early as Part 2 (the function written at the end of part 2 should be able to simplify the above function significantly), thinking about the purpose of phi functions certainly wasn’t important until Part 4.

Part 1 involved writing an interpreter to execute statements in a (very) simple subset of C, where code has already been parsed into tokens. I think this was intended to set a baseline for students who were struggling with the material; I knocked through the 14 marks worth (which is enough for a pass; that’s 47%) in just 17 minutes.

Things got somewhat tougher in part 2. The opening was easy, with two 3-mark questions inviting candidates to implement simple constant folding optimisations (in particular, that  0 + x = x and \phi(x, x) = x , along with calculating out results) and then determining the value of an expression given knowledge of some variable values.

The tricky part then began, with a task to implement constant propagation and folding to actual code blocks in SSA form. Conceptually, this is not difficult – once the value of a variable is known, it can be substituted into all subsequent occurrences. A fix-point approach fits naturally, since knowing something about some variables can lead to knowledge about more variables. Nonetheless, I struggled for a while with implementing the suggested worklist algorithm; the aforementioned fix-point approach was suggested too, and would probably have been more elegant, if harder to implement. I finished this part at 1:01.

Part 3 involved removing phi functions if it was not possible to clean them up as part of the SSA optimisation, by pushing the variable assignments back into code. It seems a decision was made to state the rules for these explicitly in the specification, for some reason (to be fair, as Tony notes it is straightforward), which seemed to make for a pretty tame last assessed part. There was some tricky thinking required here, nonetheless; I think one has to be very careful with handling multiple phis!

Finally, Part 4 tasked students with translating a function to SSA form. This itself is weighty enough that I think another test could easily be written on it. This part can be decomposed into three (maybe four) separate challenges:

  1. Translate basic blocks to SSA form correctly.
    1. Maintain correct information about versioning of variables.
    2. Decompose compound statements (like x = 3 * y + 5 * z).
  2. Add support for conditionals.
  3. Add support for loops (do-while was a part of the simple C subset used).

Squeezing all of that into the remainder of the time after finishing the first three parts would have been tough; I finished the first two full tasks (basic blocks at 2:07 and conditionals at 2:46) and squeezed out a comment on what to do for the last sub-task. The last part of task 2 was also written rather hackily as I was racing the clock. For a quick preview:

To be fair, I could probably have churned all three main and three bonus parts out in Java within the time-limit; it’s nonetheless useful to engage a different set of programming skills.

Last year, I wrote that I enjoyed revisiting Haskell; I enjoyed this year’s task too. I think I actually enjoyed this installment more, probably because of the outsize difficulty in Part 4! Although I haven’t written professional Haskell, I often find myself writing functional-style code in Java, and I’d still give it an enthusiastic thumbs-up for a choice of first language to teach at university. There are some influences from professional development coming in here, too; a year ago I probably wouldn’t have batted an eye at using types like a Haskell [(Str, [Int])] or Java Map<String, List<Integer>>, while I’ll tend to give structures types more quickly. My tolerance for long functions (without good reason, in non performance-critical scenarios) has also certainly decreased.

This post is Part 1… I’ll take a more in depth look at Part 4 of the exam in Part 2.

Should Exams Be Graded Out Of 137?

I reviewed Misbehaving a couple of weeks ago, and in the introduction the author (Richard Thaler) relates an interesting tale about grading exams. He mentions that setting the maximum score to be 137 instead of 100 in a US college context was useful, in that it allowed him to gain a better understanding of students’ performance without drawing too much complaint [1].

Thaler finds that exams can be made a lot more difficult, without hurting students’ confidence too much. The actual percentages students obtain are immaterial (because how raw scores translate to actual grades is done on a curve); yet, in a US college context where A grades often start at 90% or above, suddenly getting a 72 (even if it is an A+) could be unnerving. Making exams difficult allows students to show that they haven’t just learned the material but have understood the logic and thinking behind it. For a first year logic course, say, I wouldn’t be opposed to the exam including unseen extensions of the material covered in class (for example, branching into monadic second-order logic or simple temporal logics); for a final year course, I’d expect students to be able to understand papers on the subject (even if unseen) and apply knowledge from said papers. This may be related to my work and interests, but the courses I remember best at Imperial include disproportionately many with difficult exams (Software Engineering (Design), Complexity and Intelligent Data and Probabilistic Inference come to mind).

There is a risk of stressing students out if they do the math, of course. A few quick presses on a keyboard or calculator should reveal that, say, a 96/137 is just about 70 percent. However, one has to be fairly deliberate about doing such calculations, as 137 is a rather awkward number to mentally calculate percentages with. If you asked me to estimate what a mark out of 137 corresponded to in a 100-point scale, I would probably engage in some kind of iterative search algorithm where I added 13.7s, and then 1.4s. However, apparently people are generally lazy enough that they won’t bother with computing the percentage of their raw score. They’d simply look at the grade itself (which largely depends on where the raw score stands in relation to others, not so much on its absolute value).

Would this system be applicable or useful in the UK? I’ll acknowledge it was indeed a very clever solution in a US context, though I’m not sure it’s necessarily as useful in the UK. This mainly stems from the fact that grades in the UK are generally numerically lower; an A (a First) typically starts at 70 percent rather than 90. There’s thus already quite a fair bit of room to separate students that have merely learned the material from those that have engaged and understood it. Thaler mentions that students scored an average of 72 out of 100 on his initial exam (which caused an uproar among his students); that’s on the high side in the UK! Typically, exams at Imperial would have a fairly difficult ‘tail end’ for the last 30 percent; I could see a case for a really hard ‘last stretch’ (10 percent) which tended to occur in some of the programming tests, but I think most exams have had enough discriminatory power. Generally, you’d want to use research projects and other forms of work over longer periods of time to sift out the top students (as those are probably more similar to both industry and postgraduate academia).

Another difference is that UK universities typically aggregate course grades by a percentage system, rather than by GPA. This does make sense; I would generally consider a student scoring 90 and 65 in two modules (77.5 average) to be far superior to one scoring a pair of 72s, while the GPA system would say otherwise. I’d be inclined to think I’d prefer the 90/65 student to one with a pair of 78s, even. This is not a problem for using a raw limit of 137, as they can always be translated (non-linearly, if need be) to percentage grades.

One benefit of this system is psychological; even if the raw score might well be just 60 percent, scoring an 82 sounds much better. However, at Imperial at least, exam scripts generally aren’t returned to students. Also, this relies on most exams being graded out of 100 points (or less); if one is accustomed to 200-point exams, then being confronted with a score in the 80s might feel terrible (even though it is percentage-wise better than a 110 in a 200-point exam!)

There is also one minor objection I have, speaking more generally. Unless the exam being set is really, really hard, it’s likely that some students are likely to score above 100. Thaler claims this “(generated) a reaction approaching ecstasy” – I’m not sure I’d feel the same way as it would once again become front and center that the raw score is not a percentage, unlike most exams! The illusion would be clearly broken. Nonetheless, these students can probably take comfort in the fact (if they need to) that they’re probably well ahead of the rest of the class.

In general, I think grading out of 137 (in raw scoring terms) can be pretty useful if (1) existing mandated grade boundaries or expectations are too high, so that one can’t distinguish students’ command of material; (2) one can control how raw scores are actually translated into grades that the students receive; (3) students get to view the raw scores, and (4) most exams are graded out of 100 (or possibly less).

All four appear to be true in the US system. While (2), (4) and maybe (1) are true at Imperial at least, (3) was not – and I’d say without (3) it probably wouldn’t be worth the effort.