On Challenges that Build

On my return flight from Singapore to London, I listened to quite a few hours of music. Two of the songs I listened to and enjoyed at least partially for similar reasons were It’s Gonna Be Me (by NSync), and I Can’t Be Mad (by Nathan Sykes). It’s a bit of a strange pairing as the former seems to be an upbeat, relaxed pop song while the latter is a fairly moody piano ballad. However, the common element I latched on to here was that both songs feature sections that are repeated multiple times, with the vocals developing additional complexity on each iteration (thinking about it this is fairly common in songs that are critically reviewed well, and also in songs I like). For example, in It’s Gonna Be Me there is a line in the chorus which is sung four times over the course of the song, and its complexity develops:

The challenges in I Can’t Be Mad have a couple of changed notes, but also (if trying to reproduce the original) demand different productions of the notes (falsetto vs not, belts, etc). There’s always a risk of adding too many embellishments, though I find expanding upon base melodies can be quite interesting. Singing these, and considering what would be reasonable for my voice (adding a closing run to the last syllable above, for instance) and what would not be (adding a +1 semitone key change after the second chorus in I Can’t Be Mad – original is already awfully hard), can be enjoyable too.

Generalising this, I quite like the idea of “increasingly complex variations on the same theme” when learning concepts and when teaching them. This already seems to happen for many concepts in mathematics. Over the course of an A-level student’s mathematics education, he/she might understand how to write a quadratic expression as a product of linear factors (e.g. converting 6x^2 - 19x - 7 into (2x-7)(3x+1)). This could first begin with expressions where inspection works feasibly. However, students should also be presented with some examples where inspection is extremely difficult or even impossible (though probably only after gaining some confidence with the cases where inspection is plausible). For general expressions, one could try to use both the quadratic formula and factor theorem to factorise something like 6x^2 - 19x - 8 into -\frac{1}{24}(-12x + \sqrt{553} + 19)(12x + \sqrt{553} - 19). However, there will be some expressions like 6x^2 - 19x + 16 where the solutions to the quadratic are not real; later, with some understanding of complex numbers, these would make sense. Students will also learn about problems which may not obviously be quadratics but can be written as such (like x^4 + 2x^2 + 1); the ability to synthesise the various techniques can then be tested with something like 7x^8 - 10x^4.

To some extent my Masters project also had this theme – linear time logic, adding knowledge, adding dynamic modalities, generalising that to full branching time logic, and then switching out the infinite traces for finite traces. I haven’t written a course or a book on a computer science topic yet, but I can imagine that there might at least be sections that follow this kind of sequence.

This pattern also occurs a fair bit in many technical interviews I’ve seen as well, where problems start easy, but additional and progressively more challenging constraints are repeatedly introduced. The purposes here could include testing for a breaking point, seeing how candidates react to problems without an obvious solution, or whether they are able to synthesise additional information to come to a solution.

I find that I often learn best by practicing on smaller examples at first, and then (attempting to) generalise their conclusions to larger models, considering when these conclusions may fail or not. Having multiple variations of progressive difficulty can be useful as they can give a sense of achievement as partial progress towards an overall goal is made. Furthermore, I find understanding how changes in the problem scenario leads to the base solution method being applicable or inapplicable to be a key part of understanding as well; there is a clear need to reason about this when considering incremental variations. Going back to It’s Gonna Be Me, for example, aiming downwards at the word ‘love’ and not conserving sufficient air or energy for it might work for the first three passes, but it’s unlikely to on the last round.

There is a risk that the method can be frustrating in that it seems like it is consistently ‘moving the goalposts’, especially if one forgets that the partial goals are partial goals (and starts to think of them as complete ends in and of themselves). The standard I’m using for understanding (ability to critically evaluate applicability in novel contexts) may be seen as a little high. I also haven’t covered how to bootstrap the method (that is, how to develop an understanding of how to attack the base problem before any variations are introduced). Nonetheless I think there are some contexts where this works well. I’ve found it to be useful in singing, mathematics and interviewing at least!

Colour Rush (Board Game: FUSE, Kane Klenko)

While I’m not sure I would go the full length to say that efficiency is the highest form of beauty, I’d certainly say that I can find efficiency to indeed be beautiful. In controlled environments, I enjoy testing my own efficiency (typing tests, programming and Sudoku competitions, competitive Sims). Many of these environments feature elements of uncertainty, meaning that managing probabilistic outcomes is key to being efficient (in Sudoku bifurcation can sometimes be faster than reasoning through complex logical chains; in a programming contest, you might luck out on some buggy or sub-optimal algorithm actually passing all test cases anyway). I’ve recently enjoyed a challenging game which has probability management at its core.

FUSE is a real-time dice game from designer Kane Klenko. Players play bomb defusal experts who are trying to defuse bombs; this is represented by rolling dice and placing them on ‘bomb’ cards to match constraints. The team rolls some number of dice, and then allocates them to the cards, incurring penalties if some of the dice cannot be allocated. The game itself has a hard time limit of 10 minutes, so quick thinking is encouraged as players get more turns (and thus more rolls of the dice).

The dice are almost standard six-sided dice; interestingly, I noticed that it was not the case that opposite faces of each die added up to seven (2 was opposite 4 and 3 opposite 5, for some reason). The dice come in five colours (red, blue, black, yellow and green), and there are five dice of each colour. Card constraints may involve numbers, colours or both, and vary widely in the difficulty; easy cards have requirements like “two dice matching in either colour or number” or “A – B = 2”, while more difficult ones can involve building a six-die pyramid where each die is a specific colour. The bombs are rated in difficulty from 1 to 6, skipping 5.

Some bombs also introduce ordering constraints, requiring players to build a tower or pyramid (and players are not allowed to build unsupported structures). There is also a dexterity element here, as the penalty for knocking over a tower or pyramid is losing all progress on that card.

I’ve only played solo or in a team of two. Playing solo, a player draws and rolls three dice at the beginning of his/her turn, and then allocates them to a pool of four active cards, possibly placing multiple dice on a single card. At any time (except towards the end of the game), there are four active cards and five cards in a ‘market’ ready to become active when an active card is completed.

I’ve been able to complete the single-player “Elite” difficulty (getting through 23 cards in 10 minutes) about half of the time if playing with a random mix of 27 cards, and have done it (once in ten attempts or so) if playing with cards drawn from the most difficult sets (i.e. all the level 6s and 4s, with all remaining cards being level 3s).

I made some simple tactical observations regarding card constraints. For example, there is a level 2 card requiring three numbers that sum to 11. If I have a free choice of a number to play there, I would play as close to 4 as possible (considering that two dice most likely sum to 7).

I think the main strategy in solo play involves ensuring that one’s cards don’t end up having conflicting demands, as far as possible. For example, consider the following set of cards.

This is an awful situation to be in, because apart from one space for a 1, all of the cards here demand high numbers to start. Four decreasing numbers only admits a 4 or higher (and even then, I find starting that card with a 4 very restrictive as it forces 3-2-1); the 15 sum only admits a 3 or higher (and then forces 6-6; I usually play one six and then on a later turn find a pair adding to 9). Misaligned numerical constraints can be painful; misaligned colour constraints might be even worse because colours are sampled without replacement (well, until card(s) are cleared or penalties are incurred). There is a level 4 card that demands four dice of the same colour, which I try not to pick up.

There are many cards that have rather specific requirements (e.g. a specific colour and number); I tend to try to carry one or maybe two of these at a time, while trying to ensure the rest of the cards cycle regularly. The penalty for a unused die involves re-rolling it, and then removing a die that matches in terms of colour or number (if possible).

Although the game only lasts ten minutes (or less, if the deck is cleared in time), I’ve found it to be highly addictive, especially difficult runs on the elite difficulty. I’d expect times to vary quite widely as it’s a dice game, though I’ve noticed that I run out of time with between one and three cards from the end most of the time. I’ve never counted, but it wouldn’t be unreasonable to average fifteen seconds for a turn, and to some extent things average out over 120 selections and rolls. As mentioned above, I think a key part of solo FUSE strategy involves managing interactions between the requirements of active cards.

Minor criticisms would include some disconnect in the difficulty of the cards. I find that most of the level 6s are less luck-dependent than some of the lower level cards, like the level 3 “any 5, a yellow die, and a green 2”. More bizarrely, the ‘6-5-4 pyramid’ card in the photo above is only level 2, though it’s considerably more restrictive than the ‘sum to 15’ card also in the photo (level 3). There might be a case for a more permissive card being rated higher if the cost of verifying correctness is higher (e.g. ‘three digit prime number’), but I don’t think adding three numbers introduces much overhead.

The ceiling for individual card difficulty also isn’t very high. The highest stacks are of height 5 (with very loose constraints) and pyramids of height 3 (using six dice). I’d imagine there would be scope to define some level 8 cards that might use seven or more dice and/or require combinations that are stricter or more complex, like a six-die pyramid where each die not placed on the ground level must equal the sum of the two dice directly underneath it, or even something like this:

I’ve played this with a friend from Imperial as well. With two players, each player has two active cards, four dice are rolled and each player must take two dice. We only played up to the standard difficulty. I’d like to play this with a bigger group, though; communication gets much trickier, and also with two I am able to maintain sufficient context in my head to compute a good allocation of four dice across two sets of two cards. With five, there are 10 active cards, which would probably be unreasonable to keep track of.

In summary, I’d highly recommend this game based on my solo/pair experiences. I’ve probably had about twenty to twenty-five playthroughs in total, and have enjoyed it thoroughly even though I generally don’t enjoy real-time games. Each run is short, but can generate an adrenaline rush, and the difficulty level is highly configurable so it should be possible to find a level that works well. I might even extend the game with some custom “hard mode” cards (this can be done by getting opaque sleeves so the backs are indistinguishable). I haven’t yet had the opportunity to play with a larger group, which should bring new and interesting challenges too.

Making Heads of Tail Risks

I remember that I was fairly anxious at the beginning of my fourth year at Imperial. I was concerned about securing work after university. Looking back, this seemed patently ridiculous; I had topped my class for the third time and already had a return offer in hand from Palantir. However, owing to sweeping government rhetoric about controlling post-study work visas at the time, I saw “not being able to get a work visa” as the primary risk then, even if it was remote. That statement in and of itself was probably correct, though the time I spent to monitor and mitigate that risk (reading up on government committee reports, and considering alternatives like a H1B1, EU blue card or doing a Tier-2 ICT after a year) was excessive.

Of course, this never materialised; and even if it did, the only likely impact would be that I’d have to fly home to Singapore in between finishing uni and starting work (I did not; though on hindsight that might have been a good thing to do).

I’m not sure when I first became aware of the concept of probability distribution functions (or, for that matter, continuous random variables). These functions are continuous, take on nonnegative values and integrate (across all variables) to 1. In the case of single variable functions, one can plot them on a two-dimensional graph; one may get results looking somewhat like the picture above, in some cases.

Areas of regions underneath the graph are proportional to the probability that a value falls in that region. For example, a uniform distribution would have a probability function that’s just a horizontal line. The graphs for the return of investments 1 and 2 in the example above follow what’s called a normal distribution; investment 3 follows a Student’s t distribution which has fatter tails.

Since areas are proportional, a simple technique for generating random values from an arbitrary distribution is called rejection sampling; if one draws a box around the distribution and throws darts randomly at it, one can take the x-coordinate of the first dart that lands underneath the function as a representative random sample.

That’s a basic mathematical introduction. If we had to rank the quality of the return profiles above (remember: right means higher returns), a lot would depend on what we were trying to do. I would personally rank investment 2 (the green curve) on top; it has a considerably higher mean return than investment 1 (blue) and adds only a small amount of variability. We can calculate what’s known as the standard deviation of a given distribution; this is a measure of how much variability there is with respect to the mean. In fact, the blue curve has a standard deviation of 0.6; this is 0.7 for the green curve.

Ranking investments 1 and 3 is more difficult; the mean of 3 is higher, but you add a lot of uncertainty. I’d probably rank them 2, 1, 3. However, there is also an argument in favour of investment 3 – if one is only interested if the returns exceed a certain level. It’s a similar line of argument where if you’d ask me to double a large sum of money (nominally) in 20 years, I’d pick a bond; 10 years, a general stock index fund, and 10 minutes, probably blackjack or aggressive forex speculation.

Whichever investment we pick, it’s possible that we may get unexpectedly awful (or excellent!) results. The standard deviation could give us some measure of what to expect, but there is still a non-zero probability that we get an extreme result. For the normal distributions (the blue and green curves), there is a 99.7% probability that a single observation will be within three standard deviations of the mean; this does also mean that there’s a 0.3% probability it does not, and about a 0.15% probability it’s lower than three standard deviations below the mean.

Tail risk refers to the risk of events that may have severe impact but are low-probability; considering them is important. Going back to the work visa situation, I think I correctly identified visa policy changes as a tail risk, though in hindsight controlling the amount of time spent mitigating them was done poorly – akin to spending $10 to insure against a 1% probability of $100 loss (provided the $100 loss wasn’t crippling – which it wouldn’t have been).

I also spent a lot of time focusing on mitigating this specific tail risk, when perhaps a better solution could be developing resilience to general tail risks that may affect my employment. The obvious routes at the time would have been continuing to do well academically and develop my skills, though others exist too – such as having a greater willingness to relocate, living below one’s means and building up an emergency fund. There are still further tail risks that the above wouldn’t address (e.g. a scenario where computers and automation are universally condemned, all countries practice strict closed-border policies and the global fiat money system collapses) but the costs in mitigating those risks seem untenably high. I haven’t read Antifragile yet (what I describe here is weaker, as it doesn’t demonstrate benefiting from low-probability events), though that’s planned to be on my reading list at some point in the future.

Irreversible Changes (2018 Q1 Review)

When I was a student, the academic year was a rather friendly source of periodicity in doing reviews. The ends of the various academic terms lined up reasonably well with Q4, Q1 and Q2 reviews, and a Q3 review was typically reflection about what I got up to over the summer. I don’t think there has been a single large and direct visible change in my circumstances compared to that a year ago. I can see how under such conditions, days and weeks may all blend together into a mush of time. Of course, one can argue that there are still other distinctive events; there are obviously the seasons in weather terms, and also other cyclic patterns (e.g. college recruiting season with high interview volume, winter holidays).

In terms of development, while quantifying growth is always challenging I like to think that I’ve started to see some distinct improvements this quarter. I’m basically the product owner of a rather small but absolutely platform-critical service, and have also started driving some tasks that involve cross-team coordination. The computer-scientist part of me is also satisfied; a paper on model checking multi-agent systems against LDLK specifications on finite traces received an acceptance at AAMAS18, and there is another patent going into the pipeline. Referencing the goals I set at the beginning of the year, performance is along the lines of 0.7 for growth as a developer, 1.0 for paper submission and 1.0 for patent submission (not achieved yet, but more than on track).

I have been spending quite a lot of time on work, and I noticed that I’ve been sleeping more as well – I’m not sure why, but it’s crept up from about 7 to 8 hours pretty consistently. Some of this might be stress-related and some of it might be owing to the colder weather, though I can’t quite put my finger on it.

In terms of skill development, I’m generally slightly behind my desired pace in the various disciplines I considered at the start of the year:

  • This is the ninth post I’ve written this year (interpolation: 13).
  • I’ve only visited Singapore and Switzerland so far this year (interpolation: 3).
  • I’ve walked a daily average of 9,578 steps per day (target: 10,000).
  • I don’t have metrics for singing. I can sing B4 most of the time nowadays, but haven’t focused my efforts on developing a strong upper range to a song in particular. Interestingly my lower range seems to have weakened a bit – I used to hit E2 rather easily, while now it’s a stretch.

Financially, the markets haven’t been the smoothest.

I guess this is my first significant paper loss. I easily stayed invested and even bought more through both the corrections in mid-2015 and early 2016 (e.g. considering VWRD, a USD-denominated Vanguard world stock ETF, these were drops of 13.8% and 10.9% respectively). This time around, VWRD has “only” shed around 9.5%, so it’s not even a technical correction yet. I don’t explicitly hold VWRD, but it’s likely that my portfolio has fallen by a similar amount – except that that’s actually months of income now. There is a price to pay for that equity risk premium.

In terms of friendships, most things have been ticking over acceptably well; I guess stability here is a good thing, and that for the most part has been the case. Some bumps, but nothing out of the ordinary.

I’m very aware that I’ve been spending a larger than normal amount of time this quarter thinking about philosophy and planning, mainly along what direction I’d like to take myself in. Rather expectedly, I find this more mentally taxing than working through even the nastiest and most technical parts of my Masters thesis. I don’t quite have answers yet – and even if I did, my thoughts might easily shift given a day or two (or suitable events).

This quarter, I’ve been listening to “These Days” (Rudimental with Dan Caplen, Jess Glynne and Macklemore) a fair bit. While there are parts of the song that feel a bit formulaic (the progression in the first part of the chorus, in particular), I nonetheless enjoy listening to it. It’s also fun to sing along to, with a fairly broad vocal range and numerous runs, falsetto flips and tough notes to belt out (the middle of the chorus). The song discusses a failed relationship, though all speakers seem to have accepted its failure, and express hope that they may make the best of the situation going forward.

Three years of ups and downs, nothing to show for it now
And I know it ain’t pretty when the fire burns out

I find that the idea also works well unilaterally – having failed and/or aborted ventures is fairly normal (one could argue that a complete absence of failure correlates well with not setting one’s targets to be sufficiently ambitious). Failure does hurt, especially if one has invested lots of effort in pursuing something. Nonetheless, reaching this point implies coming to terms with one’s failure, which is a very good step.

Reading List: ‘Absolutely Nasty Sudoku, Level 2’ (Frank Longo)

I enjoy logic puzzles (not just in a formal context, to be clear). Probably one of the most well-known genres of these is sudoku, a grid-based constraint satisfaction problem made popular by the Japanese publisher Nikoli. Given a 9 by 9 grid which may be decomposed into three 3 by 3 sub-grids (boxes), fill in numbers such that there is precisely one number from 1 to 9 in each row, column and 3 by 3 sub-grid. Some numbers are already given as clues; there is usually only one unique solution for a given clue set.

Solving a sudoku puzzle, in general terms, is NP-complete (of course, for a 9 by 9 grid, solving can be done in constant time). ‘Obvious’ strategies involve finding the only place in each row/column/box where a given number could go (since each of these must contain each digit once), or finding the only number that could be assigned to a given cell.

The next step would then involve considering interactions between cells – for example, ‘these two cells in this row can only contain 3 and 5, so no other cells can contain a 3 or a 5’, or ‘both of the cells where 7 could go in this box are in the top row, so there can’t be a 7 in the top row of the other two boxes that are horizontally aligned’ (called a pointing pair). Complexity only develops from there; there are a myriad of clever strategies as shown on the sidebar here.

Sudoku (sudokus?) are often featured in newspapers, and there are many compilations of these. However, as one gets familiar with the strategies, these can become less interesting. For example, consider this hard sudoku from the Guardian; a not-that-smooth solve yielded a 5m11s time, and the most complex technique I used was the ‘pairs’ reasoning I outlined above.

I recently discovered Frank Longo’s Absolutely Nasty Sudoku series, and thankfully this is one series which seems to live up to its name. Consider this grid (which is the puzzle above, converted into an easier to read format):

The grid pictured above is from a puzzle towards the tail end of the Level 2 book. Although no cells are obvious (and it doesn’t look like the pairs/triples logic yields anything), there are several ways to proceed from here:

  • In row 3, you can have an 8 only in columns 2 and 9. This is also true in row 6. This means that no other rows can have 8s in columns 2 and 9; in particular, the second cell in the bottom-most row can’t be an 8, and has to be a 4. (This strategy is called X-Wing.) This is arguably the most straightforward deduction here, though it wasn’t the one I saw.
  • Consider that if row 5 column 1 was a 5, then row 8 column 1 must be an 8; that forces row 9 column 2 to be a 4, and row 9 column 7 to be an 8. Alternatively, row 5 column 1 can be an 8. Either way, row 5 column 7 can’t be an 8, and must be a 4. (This strategy is called an XY-Chain.)
  • Suppose row 6 column 7 was a 9. Then, the rectangle of cells with vertices in rows 1 and 3 and columns 4 and 7 admits two solutions; without affecting other cells, I can have either a 3 or 7 be in the top left corner. Thus, row 6 column 7 can’t be a 9, and must be a 3. (This strategy is called a Unique Rectangle.) This was actually the next step I made when solving this puzzle!

A key criterion when evaluating whether a Sudoku book (or a puzzle book in general) would be the difficulty and quality of the puzzles. The difficulty level for Level 2 is well-placed for someone like me, if maybe a little on the easier side; I’m familiar with most/all of the ‘basic’ techniques, and am aware of though not confident with trickier techniques like the aforementioned Unique Rectangle. I might have preferred Level 3, actually; there were several puzzles at the beginning of Level 2 that seemed like they could be solved with techniques I’d consider as basic. I also haven’t found any “broken” puzzles which couldn’t be solved (presumably, the author would have checked these with a solver before publication)!

I also have other criteria for evaluating Sudoku books, though. I tend to write down cell candidates as well as other annotations when solving. These help me more easily remember observations like pointing pairs; I’d prefer the grids to be sufficiently large that I can comfortably write these out. The grids in this book were certainly of adequate size. While my handwriting tends to be quite small, I don’t think grid size would be problematic for most people.

The choice of ring binding seems to make sense, too; the book needs to remain open when one is solving. Generally I found the paper to be very smooth to write on. I don’t have many complaints here.

The book has a very brief foreword, and then gets straight into the puzzles (with solutions at the back). The target audience is clearly people who are well-accustomed to Sudoku. As mentioned earlier, this is the second in a five-part series and based on online reviews I determined this to be a good place to start. I wouldn’t recommend this for someone just starting out with Sudoku, though.

A Quiet Winter Night

I made my way down the staircase at the end of the bridge. It was late on Thursday, somewhere between 10 and 11 pm. The snow was still falling (though quite lightly). What struck me the most, however, was the silence. For a few brief moments, it seemed like it could have been some kind of post-apocalyptic world, or the Rapture; I was alone, at least as far as I could tell.

The UK has witnessed an unexpected patch of cold weather over the past week or so. I have pretty good cold resistance and thus don’t normally detect weather changes that much, though this was of course painfully obvious. It’s possible that more snow fell in London in the last week than in the past five years – at least for the times I’ve been around.

I enjoyed the snow on the first day, though that’s about as long its welcome lasted. For some reason, I generally associate snow with bitterness and harshness, rather than the alleged fascination that people in the UK might tend to have. The reason cited in the article is that it tends to remind people of youth or Christmas, which is fair – I certainly don’t associate the former with snow, and although there is much media about snowy Christmases, I don’t think I have experienced one. I think more Dead of Winter (a board game about surviving winter with zombies) than Jingle Bells. It was interesting to experience a different kind of weather, nonetheless, but the practical pains of dealing with it came to the forefront very quickly.

For me at least, the most pertinent issue was travel disruptions. My commute is typically a 20 minute walk. though because of the snow and ice I had to be substantially more careful with my footing. I easily took 30-plus minutes to traverse the same route. Thankfully I don’t have to drive or take the train in to work, but it certainly affected my colleagues too (and it did mean the office was quieter, which in turn affected me). Expectedly, quite a few flights were cancelled as well; I wasn’t travelling, but this would have certainly been an annoyance if I was.

This wasn’t a factor the last time round, but snow can also cause problems in supply chains. I don’t think I was affected this time, but there was a similar incident in New York early last year; I was waiting on a package to be delivered. Thankfully it was delivered on the morning of the day I was flying back to London, though I got somewhat anxious about whether it would arrive on time. In general of course this could be a much larger logistics problem.

Low temperatures were another factor, though much less significant for me. To paraphrase Elsa, at least most of the time the cold never bothered me anyway – though there were two instances where I decided it made sense to switch on the heating. Apart from that, I’m not sure I did much to deal with the temperatures on their own. I did dress marginally more warmly than normal (which in say low single digit weather is a light jacket), but that was about it.

It’s also suggested that there are correlations between cold weather and various types of illnesses, such as on this NHS page. Of course, I recognise that the impacts on people who spend more time exposed to the cold (e.g. people who work outdoors, rough sleepers) would be substantially greater.

The recent weather also seems to have had a sobering effect on my thoughts. Some of this might be driven by confounding factors (shorter days, more darkness, fewer social gatherings etc). This isn’t bad per se, though I find myself easily engrossed in thoughts that may be counterproductive at times. I also read an article in the Guardian questioning why the UK was unprepared for the snow; while I’m not sure I agree with the central thesis (I don’t have the data, but this may be viewed as an actuarial decision; spending $2 to prevent a 1/10th risk of losing $10 may not be worth it), but there was a point on extreme weather making societal inequalities starkly obvious which I can follow.

The weather is forecasted to return to more normal levels in the week ahead. If that holds, I’ll appreciate the easier travel and that more people are in the office. I’ll count myself fortunate that it hasn’t impacted my routines and plans that much, at least for now.

Static Single Assignments (January Haskell Tests 2018 – Part 2)

Second in a two-part series; see Constant Propagation (January Haskell Tests 2018 – Part 1) for the first part, which covered the assessed portion of the test.

In January this year, Imperial College’s first-year computing students were tasked with implementing a tool to handle constant propagation for statements in static single assignment (SSA) form. In particular, this means that each variable is only assigned to once, and there are no compound expressions. This is useful for optimising compilers to simplify the machine code that a program compiles down to.

Of course, being required to write everything in SSA form is painful; as much as I like immutable objects, mandating their use can be costly (consider a large set, for instance, that periodically receives updates). Banning compound statements can also be troublesome; hence, SSA compilation is typically automated.

The test concluded with a challenge tasking students to implement support for converting a more general program into SSA form. The students had been working with a simple imperative language with statements to assign values to integer variables (possibly doing mathematical operations), conditional statements (if-then-else) and do-while loops.

This problem naturally decomposes into three sub-tasks: supporting SSA translation for basic blocks, conditionals and then loops. The first problem can also readily be split into two sub-problems: ensuring variables are immutable and then handling compound expressions.

Basic Blocks

Immutable Variables

I opted to begin with immutability here, as this would allow for easy testing. This is done by creating multiple versions of a variable. For example,

could become

Essentially, one could implement this by keeping track of the latest version of each variable used, and incrementing the version whenever processing an assignment. Whenever handling the right-hand side of an assignment, one needs to be able to know how to reference the latest version of a variable.

I’m pretty sure I wouldn’t have bothered to write a specific UsageMap type when I was still in university, though working in industry seems to have made me think more carefully about typing.

Compound Statements

The version of SSA in the paper also doesn’t allow compound statements (e.g. a = ((b * c) + 5) / d). However, this could be handled reasonably as arithmetic expressions can be represented in tree form.

We can create temporary variables for each non-leaf node of the expression tree. For example, we define k1 = b * c, k2 = k1 + 5 and k3 = k2 / d. Some care is needed in cases where the tree is a bit more balanced (we need to know that the first argument of our plus above is k1).

Also, of course, k is only used for illustration above, as real functions could have a value actually called k! It makes sense to have a counter separate from the rest of the UsageMap, though I relied on some limits on the variable names that were mentioned in the specification. I’m not sure if I would have worked this out in first year!

Conditional Expressions

One needs to consider which versions of a variable are in scope. Let’s consider the following block:

After reaching the end of the if block, we can’t just use the UsageMap we have obtained from that to pass into the else block (since on an execution where we’re going in to the else block, the if block must never have run). Similarly, after reaching the end of else we can’t just use the UsageMap from that (we don’t know if the else block has run).

It is probably easier to visualise this block of code as a tree. Each node represents a possible linear execution path that we may run.

The versions of variables each of the blocks read upon branching should be the version before we entered that block. Thus, we should reference the state of the UsageMap from before we entered the if block when entering the else block. We actually want something like this.

When returning, we somehow need to pick one of the final values for variables that were possibly modified. This is difficult to express, but we can approximate it with a phi function, where \phi(a, b) nondeterministically returns either a or b. We can thus simply pick a4 = \phi(a2, a3) and b4 = \phi(b2, b3), and add these statements. Finally, we would want to return a4 + b4. Some care needs to be taken with the implementation (for example, one must handle the case where a variable is only possibly modified correctly). This should, for the above block of code, yield:

During the implementation I was running out of time, which unfortunately led to this monster of a helper function. It could probably be cleaned up by perhaps making a type for ‘return values of a traversal’. It’s also probably reasonable to do some kind of higher-order function on the mod-set of variables (which I computed elsewhere).

Do-While Loops

Finally we also need to support loops. We could use a call graph structure, similar to what we had for the if-then-else conditional, except that there is a cycle.

At first glance, this looks tricky. We don’t know the number of times the loop will execute, though we do know it’s at least once.

Which value of a are we reading in the loop body? Well, this is either the value from the initial set-up, or the value the previous loop left at its end. We could thus try to introduce phi-functions:

This program is impossible to execute, of course; the definitions are cyclic. However, we’re not actually trying to execute it. We’re only using this as an intermediate representation to try to statically optimise our code. Of course, if the two arguments to a phi-function are the same, we can eliminate it, mark that value as a constant if known, and then use the information about it being a constant to drive further optimisations.

Synthesis

I think this could be a fairly challenging question in and of itself for first-year computing students, though some degree of hand-holding would likely be required (in particular, the tree traversal for splitting out intermediate assignments can be quite fiddly to work with). It would also be worth introducing the concept of basic blocks early on, as reasoning about them is more useful in handling conditions and loops. It was a rather solid and satisfying conclusion to the original 2017 test.

Constant Propagation (January Haskell Tests 2018 – Part 1)

The latest iteration of the January Haskell Tests is out (see Dr Tony Field’s page, and my post on the 2017 test). I think the last time I wrote any Haskell was eleven months ago when taking the 2017 test for a spin; nonetheless I was able to pick things up quickly. I guess the syntax (outside of monads) is pretty intuitive.

This test in some ways felt a fair bit easier than the last, at least in terms of the assessed portion (the last time round, it took me up to just past the two-hour mark to clear that; this time round, I was done with the questions that were actually worth marks by 1:15). However, the ‘bonus’ unassessed Part 4 seemed hard; finishing it in time looked to be a pretty tall order. Part 4 itself could be decomposed into three fairly chunky parts, and could even be a reasonable basis for a separate test!

The test dealt with optimising code that was already in static single assignment (SSA) form. I’d covered this topic before in a course on software reliability (within the context of static program verification), though I don’t remember covering it within the context of (possibly optimising) compilers. SSA form is fairly restrictive:

  • Each variable can only be assigned to once.
  • Operator arguments can only comprise constants and direct variable references.

For example, the assignment  x = 4 * y + 7 * z is clearly not in SSA form; but it can be rewritten as  a = 4*y; b = 7*z; x = a + b where each statement is. In most programming languages variables can be assigned to multiple times; as part of conversion, we tend to add suffixes to variables for versioning. A more involved example follows:

SSA also allows for the concept of a phi function, which has two expressions, and non-deterministically returns one or the other. This is useful for performing optimisations regarding control flow. For example, consider this function:

We can rewrite this using a phi function; after an if-block, the final value of a variable could be the value at the end of either branch.

Although one had to work with phi functions as early as Part 2 (the function written at the end of part 2 should be able to simplify the above function significantly), thinking about the purpose of phi functions certainly wasn’t important until Part 4.

Part 1 involved writing an interpreter to execute statements in a (very) simple subset of C, where code has already been parsed into tokens. I think this was intended to set a baseline for students who were struggling with the material; I knocked through the 14 marks worth (which is enough for a pass; that’s 47%) in just 17 minutes.

Things got somewhat tougher in part 2. The opening was easy, with two 3-mark questions inviting candidates to implement simple constant folding optimisations (in particular, that  0 + x = x and \phi(x, x) = x , along with calculating out results) and then determining the value of an expression given knowledge of some variable values.

The tricky part then began, with a task to implement constant propagation and folding to actual code blocks in SSA form. Conceptually, this is not difficult – once the value of a variable is known, it can be substituted into all subsequent occurrences. A fix-point approach fits naturally, since knowing something about some variables can lead to knowledge about more variables. Nonetheless, I struggled for a while with implementing the suggested worklist algorithm; the aforementioned fix-point approach was suggested too, and would probably have been more elegant, if harder to implement. I finished this part at 1:01.

Part 3 involved removing phi functions if it was not possible to clean them up as part of the SSA optimisation, by pushing the variable assignments back into code. It seems a decision was made to state the rules for these explicitly in the specification, for some reason (to be fair, as Tony notes it is straightforward), which seemed to make for a pretty tame last assessed part. There was some tricky thinking required here, nonetheless; I think one has to be very careful with handling multiple phis!

Finally, Part 4 tasked students with translating a function to SSA form. This itself is weighty enough that I think another test could easily be written on it. This part can be decomposed into three (maybe four) separate challenges:

  1. Translate basic blocks to SSA form correctly.
    1. Maintain correct information about versioning of variables.
    2. Decompose compound statements (like x = 3 * y + 5 * z).
  2. Add support for conditionals.
  3. Add support for loops (do-while was a part of the simple C subset used).

Squeezing all of that into the remainder of the time after finishing the first three parts would have been tough; I finished the first two full tasks (basic blocks at 2:07 and conditionals at 2:46) and squeezed out a comment on what to do for the last sub-task. The last part of task 2 was also written rather hackily as I was racing the clock. For a quick preview:

To be fair, I could probably have churned all three main and three bonus parts out in Java within the time-limit; it’s nonetheless useful to engage a different set of programming skills.

Last year, I wrote that I enjoyed revisiting Haskell; I enjoyed this year’s task too. I think I actually enjoyed this installment more, probably because of the outsize difficulty in Part 4! Although I haven’t written professional Haskell, I often find myself writing functional-style code in Java, and I’d still give it an enthusiastic thumbs-up for a choice of first language to teach at university. There are some influences from professional development coming in here, too; a year ago I probably wouldn’t have batted an eye at using types like a Haskell [(Str, [Int])] or Java Map<String, List<Integer>>, while I’ll tend to give structures types more quickly. My tolerance for long functions (without good reason, in non performance-critical scenarios) has also certainly decreased.

This post is Part 1… I’ll take a more in depth look at Part 4 of the exam in Part 2.

Should Exams Be Graded Out Of 137?

I reviewed Misbehaving a couple of weeks ago, and in the introduction the author (Richard Thaler) relates an interesting tale about grading exams. He mentions that setting the maximum score to be 137 instead of 100 in a US college context was useful, in that it allowed him to gain a better understanding of students’ performance without drawing too much complaint [1].

Thaler finds that exams can be made a lot more difficult, without hurting students’ confidence too much. The actual percentages students obtain are immaterial (because how raw scores translate to actual grades is done on a curve); yet, in a US college context where A grades often start at 90% or above, suddenly getting a 72 (even if it is an A+) could be unnerving. Making exams difficult allows students to show that they haven’t just learned the material but have understood the logic and thinking behind it. For a first year logic course, say, I wouldn’t be opposed to the exam including unseen extensions of the material covered in class (for example, branching into monadic second-order logic or simple temporal logics); for a final year course, I’d expect students to be able to understand papers on the subject (even if unseen) and apply knowledge from said papers. This may be related to my work and interests, but the courses I remember best at Imperial include disproportionately many with difficult exams (Software Engineering (Design), Complexity and Intelligent Data and Probabilistic Inference come to mind).

There is a risk of stressing students out if they do the math, of course. A few quick presses on a keyboard or calculator should reveal that, say, a 96/137 is just about 70 percent. However, one has to be fairly deliberate about doing such calculations, as 137 is a rather awkward number to mentally calculate percentages with. If you asked me to estimate what a mark out of 137 corresponded to in a 100-point scale, I would probably engage in some kind of iterative search algorithm where I added 13.7s, and then 1.4s. However, apparently people are generally lazy enough that they won’t bother with computing the percentage of their raw score. They’d simply look at the grade itself (which largely depends on where the raw score stands in relation to others, not so much on its absolute value).

Would this system be applicable or useful in the UK? I’ll acknowledge it was indeed a very clever solution in a US context, though I’m not sure it’s necessarily as useful in the UK. This mainly stems from the fact that grades in the UK are generally numerically lower; an A (a First) typically starts at 70 percent rather than 90. There’s thus already quite a fair bit of room to separate students that have merely learned the material from those that have engaged and understood it. Thaler mentions that students scored an average of 72 out of 100 on his initial exam (which caused an uproar among his students); that’s on the high side in the UK! Typically, exams at Imperial would have a fairly difficult ‘tail end’ for the last 30 percent; I could see a case for a really hard ‘last stretch’ (10 percent) which tended to occur in some of the programming tests, but I think most exams have had enough discriminatory power. Generally, you’d want to use research projects and other forms of work over longer periods of time to sift out the top students (as those are probably more similar to both industry and postgraduate academia).

Another difference is that UK universities typically aggregate course grades by a percentage system, rather than by GPA. This does make sense; I would generally consider a student scoring 90 and 65 in two modules (77.5 average) to be far superior to one scoring a pair of 72s, while the GPA system would say otherwise. I’d be inclined to think I’d prefer the 90/65 student to one with a pair of 78s, even. This is not a problem for using a raw limit of 137, as they can always be translated (non-linearly, if need be) to percentage grades.

One benefit of this system is psychological; even if the raw score might well be just 60 percent, scoring an 82 sounds much better. However, at Imperial at least, exam scripts generally aren’t returned to students. Also, this relies on most exams being graded out of 100 points (or less); if one is accustomed to 200-point exams, then being confronted with a score in the 80s might feel terrible (even though it is percentage-wise better than a 110 in a 200-point exam!)

There is also one minor objection I have, speaking more generally. Unless the exam being set is really, really hard, it’s likely that some students are likely to score above 100. Thaler claims this “(generated) a reaction approaching ecstasy” – I’m not sure I’d feel the same way as it would once again become front and center that the raw score is not a percentage, unlike most exams! The illusion would be clearly broken. Nonetheless, these students can probably take comfort in the fact (if they need to) that they’re probably well ahead of the rest of the class.

In general, I think grading out of 137 (in raw scoring terms) can be pretty useful if (1) existing mandated grade boundaries or expectations are too high, so that one can’t distinguish students’ command of material; (2) one can control how raw scores are actually translated into grades that the students receive; (3) students get to view the raw scores, and (4) most exams are graded out of 100 (or possibly less).

All four appear to be true in the US system. While (2), (4) and maybe (1) are true at Imperial at least, (3) was not – and I’d say without (3) it probably wouldn’t be worth the effort.

Reading List: ‘Misbehaving’ (Richard Thaler)

I spent a good amount of time thinking about whether I should register in the UK Registered Traveller Scheme (costs £70). In total, I easily spent more than seven hours thinking through this, modelling a few scenarios, and trying to estimate the value of my time relative to data about airport immigration queue timings. I earn more than £10 an hour.

In hindsight, I tried to make a good decision above, but the decision to try (at least as hard as I did) was probably a bad one. I’ve generally always been interested in making better decisions, and these decisions often involve managing resources that are finite. I’m not sure exactly when I first came across the field of behavioural economics, though. The notion of an Econ (a fully rational agent that maximises self-interest) is certainly something I’d come across in high school economics classes. I remember also familiarising myself with quite a number of these cognitive biases when attending talks on personal finance during various past internships – and certainly when reading up on personal finance online.

Misbehaving describes a wide variety of historical research events concerning behavioural economics. This covers its growth from infancy to becoming relatively widely accepted in the mainstream. While some rudimentary knowledge of mathematics is needed to follow the text, I wouldn’t say much background is required. I was fairly familiar with quite a large number of the experiments described, so I would imagine that there might not be that much new content for readers already interested in the field. Nonetheless, there were quite a few new things I came across; I hadn’t heard of the mug experiment [1] used to demonstrate the endowment effect. The analyses of closed-end mutual funds [2] and NFL draft picks were also interesting to read – it was nice to be able to relate knowledge to concrete application.

The book begins with an introduction to how the field came about, describing the author’s observations about human behaviour in practice that contradicted the true behaviour of a rational Econ. These include considering supposedly irrelevant factors (SIFs) like one’s current feelings as relevant; valuing things that one owns (one’s endowment) higher than things one could own, and an imbalance between preference for gains and losses. There was also a relatively more anecdotal list of observations the author made too. I’ve made my own list for things I’ve done in the past, and clearly do suffer from a number of these inconsistencies as well:

  • The introductory example. (What’s the value of an hour?)
  • The returns from my mutual funds come from both capital gains and dividends. So far at least, I’ve always been reinvesting all dividends. Yet, I feel happy when a dividend comes in. (Dividends are taxed more harshly than capital gains.)
  • I’m much happier receiving a gift that someone has picked out for me than if the person gave me the cash and suggested that I might be interested in buying the gift with the cash. (I can replicate the first scenario in the second, and I also have more choices.)

From there, a variety of topics are presented in a largely topical and somewhat vaguely chronological idea. I think my complaint that I found a substantial portion of the material to already be familiar was largely in sections II (“Mental Accounting”) and III (“Self-Control”), perhaps because these are the relatively common examples people with some interest in the field are likely to come across first. The later sections get more interesting. Section VI (“Finance”) played well with my own interest in personal finance; Section VII (“Helping Out”) discusses practical applications of using insights from behavioural finance to ‘nudge’ people to make better decisions regarding their own finances… or complying with tax law.

The book ends with the author outlining hopes for how behavioural economics could continue to grow in relevance, discussing fledgling work in education. There is also an interesting section where the author dispenses some advice for life in general, based on his work in behavioural economics (specifically: make observations, collect data, and be willing to speak up).

While models should indeed account for important variables, I’m not sure I necessarily agree that these factors should be consistent over time, mainly because I think individuals can learn and within certain bounds optimise their own behaviour, though not to the point of being a hyper-rational Econ. Although in chapter 6 (“The Gauntlet”) it is claimed that learning how to make many high-stakes decisions is difficult because we don’t get many attempts at them, I think some of the decision-making skills are transferable. I might not get to make many decisions buying houses, and there certainly are aspects about buying a house that are peculiar to the task itself – yet, learning to learn about relevant considerations for the domain (as I already do with consumer electronics), plan how one’s cash flow is going to work (as I already do with regular investment into mutual funds) and being more aware of possible cognitive biases (relevant in almost all decisions) all seem to be helpful.

I enjoyed the book as a whole; it was a pretty engrossing read on my flight from Singapore to London. It held my attention for at least a good three or four continuous hours on that plane, and I was eager to finish reading the book when I landed in London. I’d recommend it as a good read, though as this is written to a fairly general audience readers who already have substantial knowledge of the field might find there not to be as much fresh content.