The Time Value of Money

There is an apocryphal interview question I’ve come across several times:

Would you prefer to have a dollar today or a dollar one year from now?

I do technical interviews for software engineers. Hence, this isn’t a question I would typically ask candidates (even if at times I wish it was – it could be interesting to see how candidates react!). Although it naturally seems like it would fit in a financial context, it seems too easy to be used as a serious question.

Anyway, the answer I’d go for is “today”, because I could take the dollar and put it in the bank to earn interest. In practice, I’d invest the dollar. Furthermore, inflation is more likely to be positive than not, and this eats away at the value of the dollar. The idea that getting the dollar now is better is known as the time value of money. That said, I can also see legitimate cases why one might argue for “one year from now” – mainly centering around the idea that custody of the dollar is taken care of (assuming we allow this to be assumed!).

Conversely, if you asked me a slightly different question:

Would you prefer to have a dollar today or $100 one year from now?

I would probably go for the hundred dollars, because my investments are very unlikely to increase hundred-fold (unless we have hyperinflation) in a year. As before, there are legitimate cases why one might go against the grain of financial theory – cash flow issues, in particular.

If the amount is reduced a fair bit, such as to $1.09 (for me at least), then the decision gets more difficult. Using some kind of intermediate value theorem, there should be some value of r for which I’m indifferent to this question:

Would you prefer to have a dollar today or $(1 + r) of today’s dollars one year from now?

The conventional theory here is that if I got the dollar today, invested it for a year, and then have (1 + r) of today’s dollars, then I should be indifferent. I’m not sure I agree in practice. This is mainly because of the aforementioned cash flow issues. If 6 months on I find that I need the dollar, I can take it out and still keep the partial returns. You would need to give me an illiquidity premium. (Of course, I’ve assumed here that I invest in liquid securities.)

There is also another shortcoming of this question. The size of the capital relative to my net worth would also affect my answer. Rather interestingly, I think I would take the money early for small or large amounts, but consider waiting for medium-sized ones.

For small amounts, I would need to remember that a capital inflow is coming in a year’s time. The cost of tracking this could exceed the “premium” I derive from waiting. Conversely, for massive amounts, we start delving into the realm of diminishing marginal utility – if I could pick between $1 trillion today and $1.1 trillion this time next year, I’m pretty sure I’d pick the former.

Up to this point, we’ve also avoided what’s known as counterparty risk. The person offering the money might become insolvent within the year. This would bias people towards taking the money now, and is reminiscent of a well-known proverb (“a bird in the hand is worth two in the bush”).

Nonetheless, this practice of time discounting is useful when trying to assess the value of investments or securities, such as annuities or structured products. It is also frequently used in discounted cash-flow analyses, which are useful for determining if business ventures are likely to be profitable. I have not had to put this skill into practice yet, though (well, apart from the Computational Finance exam I did at Imperial).

In theory, these concepts should be applicable to other resources or assets which (1) appreciate over time, and (2) can accumulate in value without substantial effort. That said, I’ve struggled to think of assets outside of the standard “investment” universe (stocks, bonds, real-estate, commodities, private equity, collectibles?) that satisfy both criteria.

I did think of social capital (i.e. friendships, reputations) and human capital (for me, software development and other skills). They don’t seem to satisfy (2), though it could be argued that (2) is too strict. For example, by going about my daily routine, I already (hopefully) absorb more and better dev practices. Similarly, one needs to (well, should) do one’s homework regarding asset allocation and understanding one’s investments. Also, in practice to maintain one’s asset allocation one needs to rebalance a portfolio.

Road to Somewhere (2017 Q3 Review)

Q3 felt very much like a quarter of consolidation, as opposed to development or novelty. The two main highlights of the quarter for me were IJCAI17 and a rather convoluted hack week project at Palantir that reminded me of the applicability of raw CS theory to daily work (of course, getting an award for technical achievement was great as well).

On the coding front, there was a goal last quarter to merge 30 pull requests on AtlasDB. I hadn’t been tracking this, but it turns out that this target was actually hit on the nose. I’m somewhat surprised, as I was out of the office for two weeks, and a fair bit of my time was spent looking at other projects as well. It is very tempting to set the next goal at 31, and I think I’ll bite – though given a likely winter break and some other leave that I do want to take, I’m somewhat doubtful that I’ll hit 31 in Q4. I also had a good conversation with my lead on possible next steps for growth; I can’t say too much other than that there is consensus, and I’m excited.

Turning towards an academic focus, the elephant in the room is IJCAI 17. I was initially skeptical I would enjoy it. I initially imagined that I would be unable to follow the presenters as they eagerly pushed the boundaries of our AI knowledge. Furthermore, the amount of money I spent on it in total was a very solid chunk of the quarter’s expenditure. However, it turns out these concerns were unfounded. There were some talks that I wasn’t able to follow, likely owing to a lack of familiarity in the relevant field, but many of them were very understandable. In general I really enjoyed the experience; I was impressed by the intellectual depth and complexity of many of the presentations.

The LDLK talk itself also went better than expected; it was scheduled at 8.30 am on a Friday morning, but nonetheless many people showed up! I was happy to present in person; this is one of the few things from my time at Imperial that I can truly say I’m proud of. In general, I also enjoyed the social events, meeting Alessio and subsequently exploring a bit of Melbourne (granted, this does not have an academic focus).

Looking forward, MCMAS-Dynamic still has one more paper that can be wrung from it, this time on some of the later-stage results. The paper is well in the pipeline now, though writing it is proving more difficult than either of the previous papers. When I was writing the thesis, I focused more on getting the algorithm right and implementing it correctly than giving a full formal proof! Thus, the core algorithm remained unchanged, but I certainly waved my hands a fair bit regarding some of the more intricate details.

Financially, my portfolio pretty much went sideways over the period, though a late rally pushed it up a bit, leaving it at plus 0.9 percent (total return). In spite of surges in international stock markets, I think the pound recovered a little bit as well especially towards the end on news of a possible Bank of England rate rise. I also sold off a bunch of pounds when GBPUSD broke 1.35. We can see how this compares against a bunch of other investment options:

(N.B. I have positions in the Fidelity World Index and iShares Global Property Securities Tracker. Also I do have holdings in active funds, although I’ve been comparing with trackers for benchmarking purposes.)

Not too unreasonable. UK stocks look like they outperformed and REITs were pretty battered, hence I did worse than the Vanguard fund. Also, we can see the impact of the pound’s (minimal) recovery in some of the delta between the Fidelity and Vanguard total world funds.

Spending this quarter shot up significantly. The obvious rises were in travel and individual meals, which was to be expected given IJCAI. Clothing expenses fell significantly, though I think there might have been some self-control failures last quarter as the amount spent in Q2 looks absurd in hindsight. I’ve been keeping up the routine of walking to and from work, and this shows in transport expenditures as well. Last year’s mean transport expenditure was 81.82; this quarter’s was 18.80 (and the average for the year-to-date is 25.02). That said, I noticed that I’ve been wearing through shoes faster, but I haven’t weighed that cost in yet!

On the comms front generally things have been going well too, though there have been some minor lapses. I think a fair bit of this is not so much in that I’m not seeing my friends, but in terms of the side activities (reading groups) we used to do more actively and/or diligently. I guess being aware’s the first step here. Also, reconnecting with some friends from high school and from my UTA group has also been great.

I think the period was somewhat gentle overall, but in no way calm. I have a tendency to expect higher standards from myself if things seem easy-going, and this can be stressful (I know, it’s a bit of a perennial struggle for me). I’d toss my standard “A League of Their Own” quotes at it if I’m feeling energetic – good things are often meant to be hard – but for some reason I haven’t always had the energy this time around. Some of this might be due to mild illness (had a cold in Melbourne, now having a cough).

I’m not sure why, but Shawn Mendes’s “Mercy” has been a song I’ve been listening to a fair bit this quarter. The obvious interpretation is one where the speaker is singing to an unreciprocated love interest. However, Mendes has suggested an alternative (about mercy from the stress of his career). I independently came up with another variant; I see the speaker singing it to an idealized, but critical version of himself, which is a voice I have in my head at times. We have from the chorus:

Please have mercy on me, take it easy on my heart
Even though you don’t mean to hurt me, you keep tearing me apart
Would you please have mercy on me, I’m a puppet on your string
And even though you’ve got good intentions, I need you to set me free –

It’s kind of a spirit willing / flesh weak type of problem; there are many things I know I want to do that I think would be beneficial, but I find that my bandwidth is already pretty heavily taxed. “Why aren’t you finishing that feature yet?” “Why aren’t you getting your 8 hours of sleep?” “When are you going to get round to writing that paper?” It’s painful, and at times I need respite, but I still do appreciate this voice. It does often serve me well, even if it has to cry out multiple times with the same request before corrective action is taken.

Q4 will have the usual engineering, finance and friendships targets. I have relatively fewer plans for the period (going into Q3 I knew that IJCAI was a thing). I might need to consider and decide if there is anything additional I should pick up.

Readability and Performance

I find myself faced with tradeoffs between readability and performance fairly often. This happens both in terms of programming (where, apart from in some specific situations, the former usually isn’t too bad) and in English.

In programming, code that is readable is usually desirable. This often makes it easier to maintain, especially for other developers who subsequently need to understand it. For example, let’s suppose we have a Java hash map, and we need to return an Optional indicating the value with a key that matches a given Predicate, if it exists. However, if there are multiple keys matching the Predicate, we want to throw an exception (even if the values are the same).

I would probably write something like this:

However, in terms of performance we could clearly improve. Using limit does allow us to shortcircuit (so we don’t look at more elements than we need to). However, creating stream objects could still add significant overhead, especially if the predicate is simple.

The code below should accomplish the same function, but more quickly.

This is still fairly readable, in my opinion, though there is a bit more state to keep track of. However, if the predicate is costly, we could do even better by taking our first approach and parallelising it, essentially framing the problem as a map-reduction. (The end result is likely to be slightly less readable. It might be substantially less so for people unfamiliar with concurrency).

Of course, it is not always the case that readability and performance are negatively correlated. Optimising compilers implement many techniques that transform a program with relatively readable source code into one that is efficient for a machine. This is often true, though, when an entirely different (and usually more complex) algorithm and/or implementation is selected because of performance.

It goes without saying that readability is also subjective (tabs versus spaces is a starting point). Readability does not always imply maintainability as well. For example, deeply nested structures where each layer focuses on one specific functionality may be readable, but may prove less maintainable given that they may produce longer, more convoluted stack-traces when exceptions occur.

Defining “performance” in natural languages is tricky. We could consider performance to be efficiency of communication, assuming a perfect ability to comprehend. Well-written compilers implement the source language specification faithfully. They should thus be able to understand valid source, even if it’s not in a human readable form. This happens in practice; processes like JavaScript minification are commonplace.

Of course, English does not quite have a specification. Nonetheless, I think efficiency under perfect comprehension is probably a reasonably close analogue to software performance. For example, if I was asked to explain the logical argument rule of modus tollens, I would probably say something like:

Suppose we have if A, then B. Suppose not B. Then, we have not A (since if we had A, then we would have B).

That explanation would probably make sense to readers vaguely familiar with formal logic and/or philosophy. Again, I could conceivably further increase “efficiency”, by removing the bracketed part. I could even boil things down further:

If A, then B. Not B. Thus not A.

Of course, if the person I was talking to was unfamiliar with logic and, let’s say, a friend, I would proceed very differently. This would, in a sense, be prioritising readability:

It’s a way of making an argument in logic. The idea is that if we know that if A is true, then B is true, and we know B is false, then we can say that A must also be false. Let’s use a concrete example. Let’s say that if it rains, I’ll wear a hoodie, and I’m not wearing a hoodie. Deducing that it must not be raining would be an example of modus tollens.

That said, there is also plenty of subjectivity in terms of readability for natural languages. There have been many attempts to compute estimates or measures of readability. These often use formulae that accept word length and syllable distributions and/or lookup tables. However, many of these fail to account for, among other things, the level of abstraction in text. Going back to our three explanations of modus tollens, they have approximate Flesch-Kincaid grade levels of 0.4, -2.6 and 5.5 respectively. However, I’m fairly confident that most three or four year olds (grade -2.6) would not quite understand the concept of modus tollens after reading “if A then B, not B, thus not A”. Even if 11 to 12 year olds (grade 5.5) were split into groups and given the three explanations, I suspect that the third would still be the most likely to help the students understand the concept.

I actually find the second explanation the most readable, owing to domain knowledge. In just nine words, it gives me a precise and accurate description of what the logical rule is. It might be appropriate for a general audience. It probably would not be for an audience familiar with neither logic nor philosophy.

The first is good in that it explains why not A is reasonable. I could see myself appreciating it if the subject matter was somewhat more complex (e.g. results in modal and temporal logic).

The third features a concrete example, which is useful when learning. It also explicitly sets out some assumptions common in logic, such as “suppose X” meaning “suppose X is true“. However, I find it to be less readable, as I need to read a longer text to extract the main conclusion of the paragraph.

Earlier, I highlighted that readability and maintainability of software and source code are not one and the same. Let’s consider the notion of maintainability. If spoken (especially extemporaneously), I’m not sure this is necessary at all. If written, that could refer to the difficulty of proofreading or returning to the work when trying to revise it, perhaps?

In the software case, there are times when my hand has been forced and I’ve thus opted for a complicated, less readable implementation. Outside of word games like Taboo and exams or application processes with strict word limits, I don’t think I’ve had to for English. Thus, in general I would consider my target audience and aim to present it in what I expect would be the most efficient way possible, while still retaining the required readability.

What Does “Likely” Mean? (Estimative Probability)

I typically take some time out on Saturday mornings to do some reading. This often begins with developments in personal finance or computer science, though there’s a tendency to branch out from that; I generally don’t restrict myself from going off on tangents.

For some reason, this week I came across a Masters thesis discussing communication of probabilities, specifically in the intelligence domain. It seems that I found it via A Wealth of Common Sense, a blog concerning personal finance that I read occasionally; there was a link there to a Library of Economics blogpost discussing issues with mapping qualitative descriptions to quantitative probabilities. For example, if I say that something is highly likely to happen, I would be implying that I believe it would happen with probability at least N; however, the numerical value of N is uncertain, and differs among different people.

For me at least, N would be around 80 percent (and, incidentally, the author of that post agrees); that said, I can certainly envisage people assigning values of N as low as 0.6 or as high as 0.95. Note that the problem can also be two-tailed (e.g. about evenmightpossiblenot inconceivable). The LoE author’s son proposes a reasonable scheme, which is to have authors outline their own mapping in a glossary. This is probably (well, I mean P >=0.7) a good first step, though there are implementation challenges in terms of length as well as completeness.

It turns out that the concept of words of estimative probability is treated very seriously in the intelligence domain. It is understandably important, as briefs are typically prepared in natural language, and often need to be communicated to audiences that may not be entirely comfortable with mathematical notation. To quote a CIA officer:

Most consumers of intelligence aren’t particularly sophisticated when it comes to probabilistic analysis. They like words and pictures, too. My experience is that [they] prefer briefings that don’t center on numerical calculation. That’s not to say we can’t do it, but there’s really not that much demand for it.

Furthermore, deriving highly precise (though possibly not highly accurate) estimates for probabilities is almost certainly (*cough* I mean P <= 0.03) pointless, and is likely (P >= 0.7) damaging in that it tends to (P >= 0.6) give a false sense of security and accuracy when that does not actually exist.

The original proposal divides probabilities onto a seven-point scale (arguably five, as the ends are meant to reflect absolute certainties): certain, almost certain, probable, chances about even, probably not, almost certainly not, impossible. I think most people would agree that the above ordering is in decreasing order of probabilities. Of course, strictly adhering to the above labels would impart a degree of awkwardness to writing, and a group of variants for each level (such as highly doubtful for almost certainly not) soon developed.

Interestingly, the author also gives possible a fairly specific meaning; he defines it to mean “greater than zero and less than one” (which makes sense; of course, something always happening is certainly possible – but it seems pointless to not use the more precise word), but also adds the restriction that “no numerical odds (can) be assigned”. This seems like a useful construct, especially in the early stages of knowing things when uncertainty tends to be high; the other descriptive terms were conceived with uncertainty ranges of about 10% on each side.

The author of the Masters thesis also considers how words of estimative probability are used in other fields. I found the comparison to weather forecasting particularly interesting, as the author rightly points out that that is one field in which the general public is given numeric estimates (of the probability of precipitation). Weather forecasters typically add their own prose when reporting as well, which allowed some comparison. That said, a major difference is that in forecasting, these estimates can be derived with fair precision (and, as it turns out, accuracy) as they can (and, in the UK, do) originate from predictive models of atmospheric conditions. There seem to be standardised terms as far as the US National Weather Service is concerned, though I wasn’t able to find comparable guidance from the UK Met Office.

The clarity required from words of estimative probability depends on the consequences of miscommunication, as well. This is of course important in intelligence, with some claiming that there was miscommunication regarding warnings related to the September 11 terrorist attacks. Incorrectly reporting a weather forecast is almost certainly (ugh, P >= 0.95) less consequential, though people may make bad decisions concerning taking umbrellas when going out or hanging clothes out to dry. I can imagine contexts where this would also be very important (such as experimental trials in medicine or financial planning), though it seems for the most part some degree of ambiguity or even unknown disagreement is probably okay.

A Tour of Two Islands (IJCAI-17 Reflections)

Over the past two weeks, I took some time out from work and visited two islands, one substantially larger in land area than the other (and, incidentally, I am returning to another island). I’ve been to both at some point in the past; the larger one for vacation, and the smaller for study, work, visiting family and friends and for vacation as well. The latter is not too difficult to guess – it’s Singapore; the former is Australia.

Academic Program

I was in Melbourne for the twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17). Of course, the main highlight for me was presenting “Model Checking Multi-Agent Systems against LDLK Specifications“, a paper which built on parts of my Masters’ thesis on MCMAS-Dynamic. This was my first time presenting at an academic conference, though it’s not my first publication (that’s in AAMAS 17 on CTL*K); it was a little nerve-wracking, though I practiced the presentation a number of times beforehand and I think that helped a lot. Prof. Michael Wooldridge (the author of a commonly used textbook on multi-agent systems) even gave me a compliment that it was confidently presented. I’m not so sure about that, but I guess I have some private local state, and in any case deontic logic doesn’t typically have box p implies p as an axiom.

I also enjoyed the rest of the academic program as well. Apart from Alessio, I had the opportunity to meet up with several other academics whose names, at least, were very familiar. The keynote was delivered by Prof. Stuart Russell (one of the co-authors of the well-known Introduction to AI textbook), and discussed the concept of provably beneficial AI. Essentially, humans tend to be poor at specifying things (e.g. King Midas); many naive specifications of goals or objective functions naturally lead to agents enforcing self-preservation; more informally, “you can’t fetch the coffee if you’re dead”. The talk then discussed a way of formulating a problem such that it would always be a dominant strategy for an AI agent to allow itself to be switched off (as compared to not allowing itself to be switched off); see [Hadfield-Menell et al. 2016] (The Off-Switch Game) for more detail. The other invited speakers also had fairly interesting topics, discussing applications of AI in teaching, commerce and Texas hold-em poker.

I made it a point to attend as many paper presentation sessions as feasible. Expectedly, at a top conference like IJCAI there were quite a number of talks, especially in areas further removed from my work, that proved largely inscrutable. I think my presentation would also have been tricky to follow if one was not reasonably acquainted with temporal logic and/or multi-agent systems. Nonetheless, I gave my best effort in trying to follow the presentations, and there were certainly many interesting talks.

There was a session devoted rather specifically to the discussion of AI applied in the context of computer games. I found [Dann et al. 2017] (Real-Time Navigation in Classical Platform Games via Skill Reuse) to be particularly interesting; the key idea was having the agent acquire skills for small local movements, such as making a single jump to jump on top of a block three cells above the ground. The agent learns success probabilities for these skills, and uses a variant of Dijkstra with log-likelihood to find the “most likely” way to reach the goal safely (though there are some complications with re-planning if a step fails). The authors also demonstrated that the model performs better than the state of the art algorithm, which is based on A*, for maze-like levels. Other talks in the session also discussed level generation and physics engine inference (for platformers), RTS combat planning, narrative control in an educational RPG… and strategy derivation in Hex (this felt a bit out of place, but I guess they must have had exactly five papers that fit the theme well; sessions had either four or six presentations).

In terms of tutorials, I was unfortunately rather sleepy when I landed in Melbourne on Sunday. I did attend the tutorials on Monday, though; there was a very gently paced introduction to argumentation theory (the course Argumentation and Multi-Agent Systems which Prof. Francesca Toni teaches at Imperial would likely have covered this material, though she was on sabbatical when I was in year 4), and a less gently paced overview of heterogeneous learning (machine learning where some differences in the data sources can be exploited to yield better results).

Non-Academic Program

Stepping back from the academic side of IJCAI, I also enjoyed the social events at the conference, and I think they were run fairly well; that said, I acknowledge I don’t have a benchmark at all. The reception at the Melbourne Cricket Ground was very generous, especially in terms of dessert (chocolate tarts!), and the conference in the Docklands was certainly done to a high standard. It also introduced me to the “alternating drop” protocol, which is apparently common (but peculiar to) Australia. Essentially, given two choices of main course X and Y, instead of having guests indicate their preference, they are served XYXYXY… around the table (of course, those with dietary restrictions are excluded). I can see how this is logistically advantageous (you know beforehand that it will suffice to prepare just over N/2 of both options, given N guests), but I’m not sure I like the probability of not getting my preference…

I stayed at the “recommended” Crown Metropol which was certainly luxurious. The hotel is located south of the Yarra river in Melbourne, and is part of the Crown Entertainment Complex – among other things, this features a fairly large casino, several restaurants and luxury shopping (way out of my budget range). Nonetheless, it was a good base from which I could easily explore the city, especially on the few days I stayed on after IJCAI.

Post-Conference

I made it a point to leave a few days after the conference itself was over to explore the city, as I knew the conference days themselves were going to be fairly packed (8 am – 6.30 pm usually, running till 11 or so on the night of the Reception and Banquet). Like many work trips, the “official business” part of the trip was fairly draining – I perhaps should have seen this coming. I did manage to have a look around, though; I took a river cruise, walked through a few parts of the city and indulged in a bit of outlet shopping. I also sought out a local fast-food chain (Red Rooster, specialising in grilled chicken).

Given that I’d flown this far across the globe when going from London to Melbourne (it’s roughly 20 hours in the sky) I also decided to spend a few days in Singapore on the way back. For the most part I focused on spending time with family, though I also managed to find some time to flip through some of the IJCAI proceedings.

General Thoughts


It’s surprisingly difficult to leave work behind…

This is certainly the first time I’ve had a two-week break since I started full-time at Palantir. (I had some things I decided I would clear out last year around the Christmas and New Year period, so no long break for me then.) That said, rather unsurprisingly in hindsight attending and presenting at IJCAI was easily just as intellectually stimulating and demanding as my regular work, so it felt more like a one-week break. Still, it was quite a dramatic slowdown.

I think the break worked well. It came at a time just after I launched some fairly hefty deliverables; knowing that I was leaving for two weeks, I had been pushing somewhat hard to get things out before I left. I hadn’t been back home for 11 months. I was initially a little apprehensive about going off for two consecutive weeks, though the logistics of course made sense, and I don’t think I have major regrets, at least not yet!

Elevation (Hotseat: STEP I 2015)

Background

STEP (short for Sixth Term Examination Paper) is a somewhat difficult Maths exam that is used by several UK universities (including Cambridge, Warwick and Imperial) as part of a conditional offer for courses in Mathematics and Computer Science. A part of my conditional offer for Imperial included a 2 in any STEP paper of my choice, though it was paired with a relatively awkward 35 points for IB – perhaps because the rest of my portfolio was pretty strong.

There are three papers – I, II and III; III includes A Level Further Mathematics content, while I and II remain within the A Level Mathematics syllabus. I is also somewhat easier than II; that said, I think both papers exist because Cambridge does sometimes want students who didn’t take Further Mathematics to get a pair of grades in these exams. Nonetheless, STEP I itself is certainly no pushover. Students are graded on a scale of S, 1, 2, 3, U; the 2015 STEP I paper had 73.1 percent of students scoring at least ‘3’ (the lowest pass grade), and just 42.6 percent scoring at least ‘2’ (the lowest grade many universities would consider). This may be compared with A Level mathematics in 2015, where the analogous metrics of A*-E and A*-C respectively are 98.7 and 80.8 percent; and this is even before we factor in selection bias.

Each paper consists of 13 questions, but candidates are only required to pick six of them; their highest-scoring six questions will be used to determine their final score. Questions have equal weight (and each is marked with integers out of 20, which seems suspiciously similar to how I’ve seen this done at many universities!). Eight of the 13 questions are classified as “pure mathematics” and include tasks testing concepts like calculus, trigonometry, algebraic manipulation, series and even number theory. Three are classified as “mechanics”, typically requiring calculations on Newtonian mechanics, and two as “probability and statistics”. I almost always do 4/0/2 or 3/1/2. Note that it is possible to attempt seven or even more questions as a form of “insurance”, though given the strict time constraints this is likely to be difficult.

Performance

I had a fairly decent run, picking up 114 out of 120 points; mainly losing these to minor slips/cases where an important statement was not explicitly asserted, and a good chunk in question 7 with not clearly handling a case which was expected to be shown to bear no fruit (I thought it was rather obvious that it was not needed, and dismissed it in one line).

The last row indicates the order in which I attempted the problems; it seems this was generally consistent with how long I actually took on them (problems 2 and 13 were fast; 1 and 8 were somewhat in-between, and I struggled for a bit with 12 and messed up 7 while using up a fair bit of the time). Note that the “break-even” time if one wants to answer all questions would be 30 minutes per question.

Selected Problems in Depth

Problem 8: Series Division

First, prove that 1 + 2 + \ldots + n = \frac{n(n+1)}{2} and (N-m)^k + m^k is divisible by N. Then, consider

S = 1^k + 2^k + 3^k + \ldots + n^k

Show that if n is a positive odd integer, then S is divisible by n, and if n is even then S is divisible by n/2. Show further, that S is divisible by 1 + 2 + 3 + \ldots + n.

The two lead ins were simple. Induction does work but in both cases there were much better methods that could be used (write the series twice and pair terms up, and then pair terms in the binomial expansion). Later parts involved pairing S with a zero term, but the general theme of “pairing terms” persisted throughout the question. I think the toughest part of this one was, knowing that one had to show divisibility by \frac{n(n+1)}{2} at the very end, figuring out that it was safe to split this into two terms and show them separately. This works because the highest common factor of n and n + 1 is 1. My number theory was a bit rusty so I wasn’t sure if that was correct, and proved it during the paper.

Problem 12: On Fish Distributions

The number X of casualties arriving at a hospital each day follows a Poisson distribution with mean 8. Casualties require surgery with probability 1/4. The number of casualties arriving on each day are independent of the number arriving on any other day, as are the casualties’ requirements for surgery. (After some initial work) Prove that the number requiring surgery each day also follows a Poisson distribution and state its mean. Given that in a particular random chosen week 12 casualties require surgery on Monday and Tuesday, find the probability that 8 casualties require surgery on Monday (as a fraction, in its lowest terms).

This one really wasn’t too bad, though it involved a lot of algebraic manipulation and it seems I took quite a long time on it when doing the paper. Essentially, the independence condition should hint that if we have X casualties, the probability of S needing surgery is clearly binomially distributed. X itself is a random variable, but that’s fine; the law of conditional expectation gives us

P(S = s) = \displaystyle \sum_{t = s}^{\infty} P(S = s | X = t) P (X = t)

and a suitable substitution yields this:

P(S = s) = \displaystyle \sum_{t = s}^{\infty} \left( \frac{t!}{s! (t - s)!} \times \left( \frac{1}{4} \right)^s \times \left( \frac{3}{4} \right)^{t-s} \times \frac{e^{-8} 8^t}{t!}\right)

After some fairly involved algebraic manipulation, one can indeed recover a Poisson form for the pdf of S. Using this, the last part is relatively simple actually; we want P(S_1 = 8 | S_1 + S_2 = 12) and relying on the fact that a sum of independent Poissons is Poisson itself (so the means of S_1 and S_2 are 2 each gives us that the mean of S_1 + S_2 is Poisson, and with mean 4).

Problem 13: Probability and State Machines

Already covered in a previous post. I dispatched this one very quickly, though I was already familiar with the Markov process model that I used here.

Meta-Analysis

The main datasource we have available here is an Examiner’s Report that discusses to some extent what happened (though we don’t have full details). The grade boundary for an S is 96, so a 114 is comfortably in that range; 3.5 percent of candidates scored that. Question-level data isn’t published beyond the comments in the Examiner’s Report.

Focusing on the questions that were attempted, my opener Q1 which was a test of curve-sketching was also the highest-scoring question on the paper, with students scoring an average mark of over 13.5 (with caveats that some students were worried about curve sketching and thus avoided it). This was the second “easiest” question as far as I was concerned.

The other pure maths questions I attempted (2, 7 and 8) were also popular, and attempted with varying degrees of success (questions 2 and 7 were the second and third highest scoring questions). Probability and Statistics for some reason seems to always cause issues for students attempting these papers, with mean scores in the 2-3 range, though having done Olympiads in the past (and specialising in combinatorics and probability) I understandably found these easier.

Generally, STEP I for me tends to be fairly tough but certainly comfortable, II is solidly difficult, and doing III often makes me feel like a dog behind a chemistry set (“I have no idea what I’m doing”), especially since I didn’t take Further Maths in high school.

On Teaching and Teaching Assistants

I read an article in the Guardian a few days ago on the Finnish education system which is often cited in the western world as one example of a successful education system (in spite of some rather angry bold negative deltas in PISA 2015 – however reliable said tests are as a form of assessment). I’ve been through the table-topping Singapore education system, and while it certainly is rigorous (especially in mathematics – I recently looked at an A level Thinking Skills problem-solving question that, chillingly, wouldn’t be too out of place on the Mathematics PSLE in Singapore) there are valid concerns regarding excessive stress levels, teaching not being perceived as a high-profile job and a lack of time for students to explore things on their own. I would certainly understand a desire not to bring some of these elements into an education system.

The headline message being trust your teachers is something I can appreciate to some extent, even though I was never explicitly a teacher, at least in terms of profession. I had the privilege of being an undergraduate teaching assistant during my third and fourth years at Imperial, and I like to think that the professors and lecturers who were supervising me placed a lot of trust in me; they certainly gave me a fair bit of latitude in the content I could cover (perhaps not the “unfettered flexibility” mentioned in the article, but I was supposed to teach rather specific modules – Logic, Discrete Mathematics, Reasoning about Programs, and Data Structures and Algorithms).

I was given high ability groups in both years, and this resulted in advanced tutorials that introduced students to various slightly more advanced topics they would see soon (concurrency, temporal logics, optimisation algorithms), as well as stretched their abilities in applying the concepts and knowledge learnt (okay, you know what a loop invariant is – how can we extend it to nested loops? Functions containing loops that could be recursive?). I believe these were appreciated, and did collect feedback on them (though, of course, it’s difficult to be sure how much negative feedback was swept under the rug with these kinds of questionnaires).

Unfortunately, I did also indulge in some “teaching to the test”, explaining strategies for tackling the various exams that were certainly not a part of the syllabus. Thankfully, Imperial’s examinations don’t have too much exploitability here, as far as I can recall; I think much effort was spent identifying common pitfalls and explaining how to avoid them (e.g. unnecessary quantifier movement in Logic, and telling students to clearly demonstrate their thought processes even if they couldn’t answer the question). Some of this was certainly backed by popular demand, and it did pay off in that my students did win multiple prizes. I certainly wasn’t in a position to change the assessment system at Imperial!

I did encounter minimal bureaucracy, mainly around marking the attendance of students (some of this is part of the Home Office’s “expected contacts” requirement for non-EU students). I can’t remember if a DBS check was necessary, though I already had one from year 1, in any case. Thankfully, there was nothing along the scale of what was being described in the article:

Contrast this with the UK, where schools have data managers, where some teachers are told which colour pens to use for marking, and where books are periodically checked to ensure that learning intentions are neatly stuck in place.

Not necessarily sure that the existence of data managers is a bad thing (after all, I do work for a company that helps others make data-driven decisions!) – but that said, drawing conclusions from data that doesn’t truly reflect students’ abilities (if that is what is going on) is very unlikely to be effective (“garbage in, garbage out” springs to mind).

I did do a stint as a volunteer student helper with a few schools near Imperial as part of the Pimlico Connection programme. Although I didn’t witness said book checks, I certainly did notice teachers explicitly reference curricular objectives and the level system (this was pre-September 2014 changes). Obviously, I’m not sure how representative this is in schools in general, though. I think the only time I recall encountering this in Singapore was when preparing for the Computer Science IB HL exams.

The article concludes with an expansion of this notion of trusting individual teachers to societal trends towards trust in general, though not much evidence or data is presented on this. I guess some connections can be drawn to a point raised earlier on relative economic homogeneity. Looking at an issue of trust in the UK in specific, there is interestingly a series of studies that attempt to collect data on this. Slide 19 on the linked page suggests that 55% of British people trust the British people to “do the right thing”, whatever that entails.

In terms of trusting individual teachers, I’d probably be comfortable with that only if there was a good process for selecting teachers. That’s a difficult problem – simply going for the “best and brightest” in terms of students’ academic results certainly isn’t enough, as the Finnish process acknowledges. We did do that at Imperial, though in some sense the stakes are lower there as there is a supervisor monitoring the process and it is, still, one of the “least worst” indicators one can use. However, I think once one acquires confidence and skill such that one will not struggle with the concepts one is teaching, and one can answer students’ questions comfortably (within reason), there are many other more important traits. My knowledge of tricky automata theory or familiarity with theoretical complexity classes, or for that matter ability to knock in a 96% in the Logic and Reasoning about Programs exams (as opposed to say an 85 or 90) were generally not directly relevant to doing my job as a teaching assistant!

Running the Gauntlet (Hotseat: OCR Mathematics C1-C4)

Background

The GCE A Level is a school-leaving qualification that students in the UK take at the end of high school. Students usually take exams for 3-4 subjects. The exams are graded on a scale from A* to U (though not with all characters in between); typically an A* is awarded to the roughly top 8-9 percent of students.

This is a rather different type of challenge – previous installments of this series have featured especially difficult exams (or rather, competitions; only the MAT is realistically speaking an exam there). I’ve usually struggled to finish in the time limit (I didn’t finish the AIME and barely finished the MAT; I had some spare time on the BMO R1, but still not that much). I could of course do this in the same way as the other tests, though the score distribution would likely be close to the ceiling, with random variation simply down to careless mistakes.

Interestingly, the UK has multiple exam boards, so for this discussion we’ll be looking at OCR, which here stands not for Optical Character Recognition, but for Oxford, Cambridge and RSA (the Royal Society of Arts). The A level Maths curriculum is split into five strands: core (C), further pure (FP), mechanics (M), statistics (S) and decision (D); each strand features between two and four modules, which generally are part of a linear dependency chain – apart from FP, where FP3 is not dependent on FP2 (though it still is dependent on FP1). For the Mathematics A level, students need to take four modules from the core strand, and two additional “applied” modules; Further Mathematics involves two of the FP strand modules plus any four additional modules (but these cannot overlap with the mathematics A level ones). Thus, a student pursuing a Further Mathematics A level will take 12 distinct modules, including C1 – C4 and at least two FP modules, for example C1-4, FP{1,3}, S1-4, D1 and M1.

(In high school I took the IB diploma programme instead, which did have Further Mathematics (FM), though I didn’t take it as I picked Computer Science instead. That was before Computer Science became a group 4 subject; even then, I think I would still have wanted to do Physics, and thus would not have taken FM in any case.)

Setup

I attempted the June 2015 series of exams (C1 – C4). Each of these papers is set for 90 minutes, and is a problem set that features between about seven and ten multi-part questions. The overall maximum mark is 72 (a bit of a strange number; perhaps to give 1 minute and 15 seconds per mark?). To make things a little more interesting, we define a performance metric

P = \dfrac{M^2}{T}

where M is the proportion of marks scored, and T is the proportion of time used. For example, scoring 100 percent in half of the time allowed results in a metric of 2; scoring 50 percent of the marks using up all of the time yields a metric of 0.25. The penalty is deliberately harsher than proportional, to limit the benefit of gaming the system (i.e. finding the easiest marks and only attempting those questions).

Most of the errors were results of arithmetical or algebraic slips (there weren’t any questions which I didn’t know how to answer, though I did make a rather egregious error on C3, and stumbled a little on C4 with trying to do a complex substitution for an integral, rather than preprocessing the term). There are a few things I noted:

  • The scores for the AS-level modules (C1, C2) were considerably higher than that for the A-level modules (C3, C4). This is fairly expected, given that students only taking AS Mathematics would still need to do C1 and C2. Furthermore, from reading examiners’ reports the expectation in these exams is that students should have enough time to answer all of the questions.
  • The score for C1 was much higher than that for C2. I think there are two reasons for this – firstly, C1 is meant to be an introductory module; and secondly, no calculators are allowed in C1, meaning that examiners have to allocate time for students to perform calculations (which as far as I’m aware is something I’m relatively quick at).
  • The score for C4 was actually slightly higher than that for C3 (contrary to a possibly expected consistent decrease). While there is meant to be a linear progression, I certainly found the C3 paper notably tougher than that for C4 as well. That said, this may come from a perspective of someone aiming to secure all marks as opposed to the quantity required for a pass or an A.

We also see the penalty effect of the metric kicking in; it might be down to mental anchoring, but observe that perfect performances on C1 and C2 in the same amount of time would have yielded performance numbers just above 5 and 3, respectively.

Selected Problems in Depth

C3, Question 9

Given f(\theta) = \sin(\theta + 30^{\circ}) + \cos(\theta + 60^{\circ}), show that f(\theta) = \cos(\theta) and that f(4\theta) + 4f(2\theta) \equiv 8\cos^4\theta - 3. Then determine the greatest and least values of \frac{1}{f(4\theta) + 4f(2\theta) + 7} as \theta varies, and solve the equation, for 0^{\circ} \leq \alpha \leq 60^{\circ},

\sin(12\alpha + 30^{\circ}) + \cos(12\alpha + 60^{\circ}) + 4\sin(6\alpha + 30^{\circ}) + 4\cos(6\alpha + 30^{\circ}) = 1

This might have appeared a little intimidating, though it isn’t too bad if worked through carefully. The first expression is derived fairly quickly by using the addition formulas for sine and cosine. I then wasted a bit of time on the second part by trying to be cheeky and applying De Moivre’s theorem (so, for instance, \cos(4\theta) is the real part of e^{i(4\theta)} which is the binomial expansion of (\cos \theta + i \sin \theta)^4), subsequently using \sin^2 x = 1 - \cos^2 x where needed. This of course worked, but yielded a rather unpleasant algebra bash that could have been avoided by simply applying the double angle formulas multiple times.

The “range” part involved substitution and then reasoning on the range of \cos^4\theta (to be between 0 and 1). The final equation looked like a mouthful; using the result we had at the beginning yields

f (12 \alpha) + 4 f (6 \alpha) = 1

and then using a substitution like \beta = 3 \alpha, we can reduce the equation to 8 \cos^4 \beta - 3 = 1. We then get \cos \beta = \pm \left( \frac{1}{2} \right)^{(1/4)} and we can finish by dividing the values of \beta by 3 to recover \alpha.

C4, Question 6

Using the quotient rule, show that the derivative of \frac{\cos x}{\sin x} is \frac{-1}{\sin^2x}. Then show that

\displaystyle \int_{\frac{1}{6}\pi}^{\frac{1}{4}\pi} \dfrac{\sqrt{1 + \cos 2x}}{\sin x \sin 2x} = \dfrac{1}{2}\left(\sqrt{6} - \sqrt{2}\right)

The first part is easy (you’re given the answer, and even told how to do it). The second was more interesting; my first instinct was to attempt to substitute t = \sqrt{1 + \cos 2x} which removed the square root, but it was extremely difficult to rewrite the resulting expression in terms of t as opposed to x. I then noticed that there was a nice way to eliminate the square root with \cos 2x = 2 \cos^2 x - 1. The integrand then simplifies down into a constant multiple of \frac{-1}{\sin^2x}; using the first result and simplifying the resultant expression should yield the result. That said, I wasted a fair bit of time here with the initial substitution attempt.

Meta-Analysis

To some extent this is difficult, because students don’t generally do A levels in this way (for very good reasons), and I’m sure that there must be students out there who could similarly blast through the modules in less than half the time given or better (but there is no data about this). Nonetheless, the A level boards usually publish Examiners’ Reports, which can be fairly interesting to read through though generally lacking in data. The C3 report was fairly rich in detail, though; and the 68/72 score was actually not too great (notice that “8% of candidates scored 70 or higher”). Indeed the aforementioned question 9 caused difficulties, though the preceding question 8 on logarithms was hardest in terms of having the lowest proportion of candidates recording full marks.

Probability and State Machines

Suppose you have a fair six-sided die, and repeatedly toss it. What is the probability that you throw exactly one 4 before your first 6, or you throw exactly one 5 before your first 6 (inclusive or)?
(STEP I Q13, 2015)

There are several ways to approach this problem. There’s a “conventional” method that involves careful use of binomial series (and that was the approach taken in the mark scheme). However, the approach I used for this was somewhat different; some of it may be attributable to my computer science (or perhaps even logic/verification) background.

We can model this event using a finite state machine, where each state has a transition distribution that indicates the probability of transitioning to each other state in the state machine. For example, we can consider a simpler problem: what is the probability that you throw exactly one 4 before your first 6?

Well, we can be in multiple “states”:

  • We could not have rolled anything of significance yet. In this case, we have a 4 in 6 probability of staying in this state (1, 2, 3 or 5); a 1 in 6 probability of immediate failure (rolling a 6), and a 1 in 6 probability of registering a 4.
  • We could have already registered a 4. In this case, we again have a 4 in 6 probability of staying in this state (1, 2, 3 or 5), a 1 in 6 probability of success (rolling a 6), and a 1 in 6 probability of failure (rolling a 4).
  • We could already have succeeded or failed. Additional rolls won’t change anything in these cases.

This can be modelled as the following state machine:

We can then compute a success probability for each state – that is, “given I’m in this state, what’s the probability I succeed”?

  • The success probability PS of the state Pass is 1, and that of Fail is 0; we write this as PS(Pass) = 1 and PS(Fail) = 0.
  • PS(4) is a bit more interesting; it is \frac{4}{6}PS(4) + \frac{1}{6}PS(Pass) + \frac{1}{6}PS(Fail) and solving this yields half. This is intuitively expected; at this point, we’re counting on rolling a 6 before rolling a 4, and on a fair die the probability of that is half.
  • We can now determine PS(Start); this is \frac{4}{6}PS(Start) + \frac{1}{6}PS(4) + \frac{1}{6}PS(Fail). Solving the equation yields the answer of \frac{1}{4}.

Note that we could only use this approach because, with the way we’ve defined the states, the process is memoryless; that is, the transition probabilities depend only on the current state. We could try to directly construct a state machine for the original question (either exactly one 4 or exactly one 5 before the first 6, or both), though it seems that the requirement of memorylessness makes things somewhat complex; we would need states tracking whether we experienced zero, one, or two-plus of each number.

We can use a well-known probability axiom here, though; P(A \cup B) = P(A) + P(B) - P(A \cap B) – defining rolling exactly one 4 before one 6 as event A and respectively with a 5 for event B. Furthermore, our initial exploration already yielded values for P(A) as a quarter, and P(B) would also be a quarter by symmetry. We thus, instead, construct a state machine for the intersection, where both need to be satisfied.

  • Throughout, a 1, 2 or 3 does nothing and simply puts us back in the state we were in.
  • As we start, we can also roll a 4 or 5, which mean we’ve got the one 4 or 5 we want; or we can roll a 6, which is an immediate failure.
  • Once we have a 4 registered, we want to roll a 5 which gets us into the (4, 5) state. 6 remains an immediate failure, but 4 also now becomes one (remember that we want to have exactly one 4 and exactly one 5 before our 6).
  • The above logic also holds for 5s, except with 4 and 5 swapped.
  • Finally, in the (4, 5) state, rolling a 6 yields success, but either a 4 or a 5 would become immediate failures.

Here’s the result:

We could use the same method of success probabilities used earlier; I’ll take a little shortcut.

  • PS(4,5) must be \frac{1}{3} as we’re counting on a 6 being rolled before either a 4 or a 5, and with a fair die each of these has equal probability of being the first of the set to be rolled.
  • PS(5) is \frac{1}{6}PS(4,5) + \frac{2}{6}PS(Fail) + \frac{3}{6}PS(5), which gives us PS(5) = \frac{1}{9}. By symmetry, we can be sure that PS(4) = \frac{1}{9} too.
  • PS(Start) is \frac{3}{6}PS(Start) + \frac{1}{6}PS(4) + \frac{1}{6}PS(5) + \frac{1}{6}PS(Fail). Solving that yields PS(Start) = \frac{2}{27}.

We’re not done yet, though, as this isn’t what we wanted. We wanted P(A \cup B); but now we can apply the identity and get

P(A \cup B) = P(A) + P(B) - P(A \cap B) = \dfrac{1}{4} + \dfrac{1}{4} - \dfrac{2}{27} = \dfrac{23}{54}

which is the same answer obtained by the STEP problem-setters, though they used binomial series instead.

The above approach can of course be generalised to handle expected values (on the basis that there can be multiple distinct terminal states, where each state is assigned a value). For example, if a player won a consolation prize of $21 if he ‘bust out’ (rolled a 6 before rolling either a 4 or a 5), and a full $60 if he won, and we wanted to determine the value of the game, we could draw the graph with three terminals instead, perhaps something like this:

We would then perform our computations on the expected value (EV) of each state, with terminals being exactly their value.

Also, though for our examples we didn’t have cycles in our graphs, it’s possible that in general there could be cycles. Of course, these can be handled by solving the multiple PS or EV-equations simultaneously.

A Metric for Speedrunning Exams

In gaming, a speedrun is a playthrough of a game where one attempts to complete it as quickly as possible, perhaps subject to some constraints (such as completing a game with 100% completion of all sidequests). I wouldn’t say I’m skilled/fast enough for aggressively focusing on speed in many of the tests I’ve covered; I didn’t finish the AIME, and while I did pretty much “complete” the BMO and MAT within their time limits it wasn’t with much time to spare at all.

Typically when taking exams where one expects to finish comfortably (and even in cases when one does not) I find it’s rarely a good idea to go as fast as possible; mistakes are likely to be made and one typically wants to secure the marks for questions one is able to answer. I’d almost always use up almost all of the time as well; there’s typically no reason not to go back and check one’s past work, and/or improve the quality of one’s existing answers. Generally up to and including the IB exams I took at the end of high school, for most subjects there was time left over at the end; once at Imperial most exams tended to be very fast-paced – I would either finish them with typically fewer than ten minutes on the clock, or not finish them at all. (There were exceptions, such as first-year Logic, second-year Concurrency and fourth-year Modal Logic.)

A speedrun of an exam would involve completing the questions as quickly as possible; perhaps using much less time than is given. In constructing a metric for this, it becomes apparent that incorrect answers need to be penalised in some way (it would otherwise be feasible to leave all questions blank and immediately stop the clock). However, ideally the penalties would not be too harsh; for example, invalidating performances with any answers incorrect would address our first concern, but would not be enjoyable at all. It also seems awkward to invalidate an entire run on the basis of not knowing, say, a single fact or definition.

There are two obvious raw metrics for performance of a speedrun of an exam:

  • the proportion of total marks scored, M \in [0, 1] where higher is better;
  • the proportion of time taken, T \in [0, 1] where lower is better.

Combining those metrics, I think the following metric for an overall performance P is fairly intuitive.

P_0 = M/T

In a sense we’re measuring efficiency with respect to time, against a benchmark student who uses all of the time and scores 100% of the marks. However, I don’t think this is a good metric, because it can readily be abused; a strategy that quickly determines the easiest mark(s) to score on the paper and then attempts only those marks will do very well (note: not necessarily questions; for example, “find the roots of f(x) = 36x^4 - 23x^3 - 31x^2 - 102x” is a rather nasty question, but there’s one obvious answer that drops out in a second or two).

Of course, a way around this could be that scripts must be checked at the end, to ensure that there’s a reasonable bona fide attempt to each and every problem. However, that requires manual verification. An alternative option could be to introduce a minimum mark threshold m; attempts that have M < m are invalidated.

P_1 = \begin{cases} 0 & M < m \\ M/T & M \geq m \end{cases}

This metric is a decent improvement, though it still has some issues:

  • A strategy that seeks to identify the m easiest marks to obtain and then attempts only those would perform well. This can be mitigated if m is fairly high; for example, for A level mathematics papers I would set something like m = 0.93 or so.
  • Also, if m is set too high (e.g. m = 1 is the aforementioned “invalidate all non-perfect runs” strategy), too many runs, including runs that fail because of unknown facts/definitions may be invalidated.

We can also consider continuous penalty functions based on M that increase perhaps more harshly than M itself, as M decreases from 1. For example,

P_2 = \max \left\lbrace 0, \dfrac{1 - 2(1 - M)}{T} \right\rbrace

Thus, a run with M = 0.7 for a given time has its score reduced to 40 percent of another run with the same time but M = 1. The max from 0 could be left out, if one wished to give negative scores to runs with M < 0.5 though I think that’s fairly harsh.

There’s also no reason we should restrict ourselves to linear functions. Consider

P_3 = M^\alpha / T, \alpha > 1

Higher values of \alpha will penalise flaws more heavily; consider two runs with the same time, but one having M = 0.9 and the other M = 1; with \alpha = 2 the imperfect run scores 81 percent the score of the perfect run, but with \alpha = 10 the imperfect run scores a mere 34.9 percent the score of the perfect run! Also, observe that as \alpha \rightarrow \infty we approach P_1 with a 100 percent threshold.

Of course, we also have the exponential option:

P_4 = e^{-\lambda (1 - M)} / T, \lambda > 0

In approaches 3 and 4, each additional mistake is penalised more. I think this makes sense for papers where candidates who understand the concepts can be expected to lose most or all of the points they lose to careless errors.

One or two slips should, in my opinion, result in a notable performance hit relative to perfect runs, but the run should still be salvageable; more than a few would seem like grounds for invalidating the run. It could be possible to blend the threshold idea (approach 1) with either approach 3 or 4, though we could argue that the heavy penalties involved would already destroy runs with “more than a few” errors.