Browse Month

July 2017

Probability and State Machines

Suppose you have a fair six-sided die, and repeatedly toss it. What is the probability that you throw exactly one 4 before your first 6, or you throw exactly one 5 before your first 6 (inclusive or)?
(STEP I Q13, 2015)

There are several ways to approach this problem. There’s a “conventional” method that involves careful use of binomial series (and that was the approach taken in the mark scheme). However, the approach I used for this was somewhat different; some of it may be attributable to my computer science (or perhaps even logic/verification) background.

We can model this event using a finite state machine, where each state has a transition distribution that indicates the probability of transitioning to each other state in the state machine. For example, we can consider a simpler problem: what is the probability that you throw exactly one 4 before your first 6?

Well, we can be in multiple “states”:

  • We could not have rolled anything of significance yet. In this case, we have a 4 in 6 probability of staying in this state (1, 2, 3 or 5); a 1 in 6 probability of immediate failure (rolling a 6), and a 1 in 6 probability of registering a 4.
  • We could have already registered a 4. In this case, we again have a 4 in 6 probability of staying in this state (1, 2, 3 or 5), a 1 in 6 probability of success (rolling a 6), and a 1 in 6 probability of failure (rolling a 4).
  • We could already have succeeded or failed. Additional rolls won’t change anything in these cases.

This can be modelled as the following state machine:

We can then compute a success probability for each state – that is, “given I’m in this state, what’s the probability I succeed”?

  • The success probability PS of the state Pass is 1, and that of Fail is 0; we write this as PS(Pass) = 1 and PS(Fail) = 0.
  • PS(4) is a bit more interesting; it is \frac{4}{6}PS(4) + \frac{1}{6}PS(Pass) + \frac{1}{6}PS(Fail) and solving this yields half. This is intuitively expected; at this point, we’re counting on rolling a 6 before rolling a 4, and on a fair die the probability of that is half.
  • We can now determine PS(Start); this is \frac{4}{6}PS(Start) + \frac{1}{6}PS(4) + \frac{1}{6}PS(Fail). Solving the equation yields the answer of \frac{1}{4}.

Note that we could only use this approach because, with the way we’ve defined the states, the process is memoryless; that is, the transition probabilities depend only on the current state. We could try to directly construct a state machine for the original question (either exactly one 4 or exactly one 5 before the first 6, or both), though it seems that the requirement of memorylessness makes things somewhat complex; we would need states tracking whether we experienced zero, one, or two-plus of each number.

We can use a well-known probability axiom here, though; P(A \cup B) = P(A) + P(B) - P(A \cap B) – defining rolling exactly one 4 before one 6 as event A and respectively with a 5 for event B. Furthermore, our initial exploration already yielded values for P(A) as a quarter, and P(B) would also be a quarter by symmetry. We thus, instead, construct a state machine for the intersection, where both need to be satisfied.

  • Throughout, a 1, 2 or 3 does nothing and simply puts us back in the state we were in.
  • As we start, we can also roll a 4 or 5, which mean we’ve got the one 4 or 5 we want; or we can roll a 6, which is an immediate failure.
  • Once we have a 4 registered, we want to roll a 5 which gets us into the (4, 5) state. 6 remains an immediate failure, but 4 also now becomes one (remember that we want to have exactly one 4 and exactly one 5 before our 6).
  • The above logic also holds for 5s, except with 4 and 5 swapped.
  • Finally, in the (4, 5) state, rolling a 6 yields success, but either a 4 or a 5 would become immediate failures.

Here’s the result:

We could use the same method of success probabilities used earlier; I’ll take a little shortcut.

  • PS(4,5) must be \frac{1}{3} as we’re counting on a 6 being rolled before either a 4 or a 5, and with a fair die each of these has equal probability of being the first of the set to be rolled.
  • PS(5) is \frac{1}{6}PS(4,5) + \frac{2}{6}PS(Fail) + \frac{3}{6}PS(5), which gives us PS(5) = \frac{1}{9}. By symmetry, we can be sure that PS(4) = \frac{1}{9} too.
  • PS(Start) is \frac{3}{6}PS(Start) + \frac{1}{6}PS(4) + \frac{1}{6}PS(5) + \frac{1}{6}PS(Fail). Solving that yields PS(Start) = \frac{2}{27}.

We’re not done yet, though, as this isn’t what we wanted. We wanted P(A \cup B); but now we can apply the identity and get

P(A \cup B) = P(A) + P(B) - P(A \cap B) = \dfrac{1}{4} + \dfrac{1}{4} - \dfrac{2}{27} = \dfrac{23}{54}

which is the same answer obtained by the STEP problem-setters, though they used binomial series instead.

The above approach can of course be generalised to handle expected values (on the basis that there can be multiple distinct terminal states, where each state is assigned a value). For example, if a player won a consolation prize of $21 if he ‘bust out’ (rolled a 6 before rolling either a 4 or a 5), and a full $60 if he won, and we wanted to determine the value of the game, we could draw the graph with three terminals instead, perhaps something like this:

We would then perform our computations on the expected value (EV) of each state, with terminals being exactly their value.

Also, though for our examples we didn’t have cycles in our graphs, it’s possible that in general there could be cycles. Of course, these can be handled by solving the multiple PS or EV-equations simultaneously.

A Metric for Speedrunning Exams

In gaming, a speedrun is a playthrough of a game where one attempts to complete it as quickly as possible, perhaps subject to some constraints (such as completing a game with 100% completion of all sidequests). I wouldn’t say I’m skilled/fast enough for aggressively focusing on speed in many of the tests I’ve covered; I didn’t finish the AIME, and while I did pretty much “complete” the BMO and MAT within their time limits it wasn’t with much time to spare at all.

Typically when taking exams where one expects to finish comfortably (and even in cases when one does not) I find it’s rarely a good idea to go as fast as possible; mistakes are likely to be made and one typically wants to secure the marks for questions one is able to answer. I’d almost always use up almost all of the time as well; there’s typically no reason not to go back and check one’s past work, and/or improve the quality of one’s existing answers. Generally up to and including the IB exams I took at the end of high school, for most subjects there was time left over at the end; once at Imperial most exams tended to be very fast-paced – I would either finish them with typically fewer than ten minutes on the clock, or not finish them at all. (There were exceptions, such as first-year Logic, second-year Concurrency and fourth-year Modal Logic.)

A speedrun of an exam would involve completing the questions as quickly as possible; perhaps using much less time than is given. In constructing a metric for this, it becomes apparent that incorrect answers need to be penalised in some way (it would otherwise be feasible to leave all questions blank and immediately stop the clock). However, ideally the penalties would not be too harsh; for example, invalidating performances with any answers incorrect would address our first concern, but would not be enjoyable at all. It also seems awkward to invalidate an entire run on the basis of not knowing, say, a single fact or definition.

There are two obvious raw metrics for performance of a speedrun of an exam:

  • the proportion of total marks scored, M \in [0, 1] where higher is better;
  • the proportion of time taken, T \in [0, 1] where lower is better.

Combining those metrics, I think the following metric for an overall performance P is fairly intuitive.

P_0 = M/T

In a sense we’re measuring efficiency with respect to time, against a benchmark student who uses all of the time and scores 100% of the marks. However, I don’t think this is a good metric, because it can readily be abused; a strategy that quickly determines the easiest mark(s) to score on the paper and then attempts only those marks will do very well (note: not necessarily questions; for example, “find the roots of f(x) = 36x^4 - 23x^3 - 31x^2 - 102x” is a rather nasty question, but there’s one obvious answer that drops out in a second or two).

Of course, a way around this could be that scripts must be checked at the end, to ensure that there’s a reasonable bona fide attempt to each and every problem. However, that requires manual verification. An alternative option could be to introduce a minimum mark threshold m; attempts that have M < m are invalidated.

P_1 = \begin{cases} 0 & M < m \\ M/T & M \geq m \end{cases}

This metric is a decent improvement, though it still has some issues:

  • A strategy that seeks to identify the m easiest marks to obtain and then attempts only those would perform well. This can be mitigated if m is fairly high; for example, for A level mathematics papers I would set something like m = 0.93 or so.
  • Also, if m is set too high (e.g. m = 1 is the aforementioned “invalidate all non-perfect runs” strategy), too many runs, including runs that fail because of unknown facts/definitions may be invalidated.

We can also consider continuous penalty functions based on M that increase perhaps more harshly than M itself, as M decreases from 1. For example,

P_2 = \max \left\lbrace 0, \dfrac{1 - 2(1 - M)}{T} \right\rbrace

Thus, a run with M = 0.7 for a given time has its score reduced to 40 percent of another run with the same time but M = 1. The max from 0 could be left out, if one wished to give negative scores to runs with M < 0.5 though I think that’s fairly harsh.

There’s also no reason we should restrict ourselves to linear functions. Consider

P_3 = M^\alpha / T, \alpha > 1

Higher values of \alpha will penalise flaws more heavily; consider two runs with the same time, but one having M = 0.9 and the other M = 1; with \alpha = 2 the imperfect run scores 81 percent the score of the perfect run, but with \alpha = 10 the imperfect run scores a mere 34.9 percent the score of the perfect run! Also, observe that as \alpha \rightarrow \infty we approach P_1 with a 100 percent threshold.

Of course, we also have the exponential option:

P_4 = e^{-\lambda (1 - M)} / T, \lambda > 0

In approaches 3 and 4, each additional mistake is penalised more. I think this makes sense for papers where candidates who understand the concepts can be expected to lose most or all of the points they lose to careless errors.

One or two slips should, in my opinion, result in a notable performance hit relative to perfect runs, but the run should still be salvageable; more than a few would seem like grounds for invalidating the run. It could be possible to blend the threshold idea (approach 1) with either approach 3 or 4, though we could argue that the heavy penalties involved would already destroy runs with “more than a few” errors.

 

Portmanteau (Spliterator)

For the most part, classes in the standard Java libraries consist of fairly normal words (e.g. a ScheduledExecutorService is an ExecutorService – a service that manages the execution of tasks – which allows for scheduling). There are exceptions (and I don’t mean the Java ones), of course, such as the OMGVMCID.

Java 8, specifically JSR-335 brought about a bunch of new classes, including some of my Guava favourites (the function package, with Supplier and friends, and Optional) and Streams. Most of these, again, have fairly normal-sounding names; there are a few that might sound a bit hacked together like BiConsumer but I think the meaning is generally intuitive (a consumer of two arguments). There is one that was initially fairly puzzling, though – “Spliterator” (rendered as one word). The obvious reading of that is as a portmanteau of “split” and “iterator”, but I wasn’t sure at first how useful it would be. Of course I have written code to partition collections or other iterables, but this was usually done on an ad-hoc basis.

The API seems designed around easily scheduling parallel computation. The key method here is trySplit(). To quote the Java API docs:

If this spliterator can be partitioned, returns a Spliterator covering elements, that will, upon return from this method, not be covered by this Spliterator.

I’ll go over this in a bit more detail.

  • If the spliterator can currently be split, then trySplit() returns a new spliterator, and this “takes over” some of the elements in our existing spliterator. Our existing spliterator, if traversed directly after this method returns (e.g. with tryAdvance()), will not have these elements.
  • Otherwise, trySplit() returns null. Our existing spliterator retains its elements.

For example, this is one way of doing a parallel file deletion, which starts from one spliterator and splits it as necessary.

// Assume there exists an ExecutorService, EXECUTOR.

class DeleteTask implements Runnable {
  private final Spliterator<File> filesToDelete;
  
  DeleteTask(Spliterator<File> filesToDelete) { 
    this.filesToDelete = filesToDelete; 
  }

  @Override
  public void run() {
    Spliterator<T> filesToDelegate = filesToDelete.trySplit();
    Future<T> delegateFuture;
    while (filesToDelegate != null) {
      delegateFuture = EXECUTOR.submit(new DeleteTask(filesToDelegate));
      filesToDelegate = filesToDelete.trySplit();
    }
    filesToDelete.forEachRemaining(File::delete);
    delegateFuture.get(); // we can use a fancier way of managing completions if needed
  }
}

Viewing this in relation to parallel programming, there are obvious similarities to fork-join based workflows, especially in cases where the work involves a map or filter, or doing some reduction that’s commutative and associative. A traditional way of writing the above wouldn’t be too different; you might write some logic that splits the files into batches (which now lives in the spliterator), and then similarly dispatch deletion jobs to an ExecutorService.

From the above, it seems that a one notable difference is where responsibility for how the stream should be split lives. The Spliterator has some responsibility for deciding whether it should split itself. To some extent this makes sense; in cases where we want batches to have rather specific sizes (e.g. if performance is highly sensitive to elements being processed in specific batch sizes), we can guard trySplit() with suitable calls to the estimateSize() or getExactSizeIfKnown() methods. This also can be useful for allowing us to avoid fiddly edge cases (where our size bounds pass, but we must work with less than a full batch of elements, for instance).

Spliterators are also useful because they include characteristics that dictate, among other things, how the elements are to be traversed. For example (though this doesn’t quite apply to the file deletion example) we may be able to use a certain optimisation if we can assume that we will see the data in sorted order; spliterators do carry this information (whereas if we used a standard Iterator we might not be able to easily do this without some additional bookkeeping). The same might apply for cases where we can do better if we know elements are unique. The collections API also generates spliterators with the correct characteristics, in a sense (e.g. a NavigableSet would give you a spliterator that already has the sorted and unique properties present).

I’m still not sure why this name was decided upon, as opposed to, say, SplittableIterator. Perhaps this was done in the interest of terseness and because the decomposition is fairly obvious, though the name still really doesn’t look like a proper word to me. It does seem that spliterators have use cases beyond standard iterators as well, even when parallel computation is out of the picture – the characteristics and ability to dynamically switch on them seems potentially useful. That said, I still have not used a Spliterator directly in my professional work, and have no intention of doing so unless it fits the task I’m doing!

Volatile Variables (2017 Q2 Review)

It feels like Q2 went by very quickly, perhaps because I’m settling in somewhat into my team at Palantir, and many things have been relatively stable this quarter.

Looking at GitHub statistics, I merged in 29 pull requests over the past three months (one up from Q1). I think one thing that certainly has improved is the size of my pull requests, however; I’ve made a conscious effort to split up features into several reasonably-sized chunks; this includes chains of four or five PRs at times (and some of these can run into the mid to high hundreds of lines each – suggesting that we would easily be looking at a 3000-line monster again if we didn’t do this split). I didn’t check the data, but it’s quite likely I committed less code in Q2 than in Q1; some of this stems from being more involved in reviewing others’ pull requests, as well as dicey debugging (where days can be spent to figure out a very small tweak to make things work). That’s certainly not a problem though – one of the PRs I’m happiest with involved a net reduction in the size of the codebase.

Other professional goals have been going acceptably well. I’ve continued to keep on keeping on as far as interviewing is concerned. Generally technical development has chugged on, though at a bit of a slower pace; my paper reading groups have slowed down a fair bit, for various reasons (friends were travelling or otherwise unavailable), but hopefully that should pick back up in Q3. The academic side of things is a bit of a brighter spot for me as well; a paper on symbolic model checking LDLK was accepted at IJCAI 2017, and I also cleaned up version 1.3.0 that includes support for CTL*K.

I think in Q3 I would aim to merge at least thirty pull requests (I know, it’s a blunt metric). The paper reading groups should continue. I also want to continue on my “Imperial Computing Comprehensives” project; that is an extra-difficult exam paper that crosses disciplines between all compulsory first- and second-year modules at Imperial. (I think my average module mark at Imperial was probably just over 90, but I’m trying to set the paper so that I would score a low first; around a 75.)

Financially, my portfolio increased by 0.3 percent from Q1 to Q2 (would actually have decreased if not for reinvested dividends). I think I’ve actually been decently successful in not worrying excessively about the vicissitudes of the market; in early April as well as at the very end of June my portfolio took beatings of more than a thousand pounds, but I wasn’t too concerned about that. I’m somehow skeptical that the bull market will keep going, but I’m not complaining.

Focusing more on things within my sphere of influence here, notable changes in the Q2 budget involved a drop-off in transport expenditure (as I started really taking walking to work more seriously) and… not much else. There was one big luxury purchase in Q1 (laptop), as well as in Q2 (IJCAI travels). Generally, things were fairly tight though not as tight as in Q1, especially since I didn’t do any specific budget challenges (these usually take place in February). I’m planning on loosening this a little in Q3, for various reasons, including anxiety about valuations in the market, possible travel and a desire to have some better food as well. I can cost the ingredients of my lunch today at 39p!

I’ve started taking up competitive puzzle-solving in my spare time as well. These involve solving logical puzzles against the clock (both well-known ones like Sudoku, Kakuro or Slitherlink, as well as some that are written specifically for a competition – like the Odd Sums Sudoku I discussed a while back). I’m still pretty bad at these, but we’ll see how things go. Generally I find that my logical skills are decent, but my spatial reasoning is not great; I struggle a lot with loop puzzles, or toroidal sudoku (basically like normal sudoku, except regions are irregularly shaped and can wrap around). I’ve continued to practice my singing as well – those A4s are still stable; B4s and C5s are still a bit flaky.

I travelled a couple of times in Q2 – Aarhus, Palo Alto, and then Zagreb (wow, literally from A to Z – I can think of Zurich which is lexicographically later than Zagreb, though can’t think of anywhere before Aarhus). I enjoyed each trip for various reasons, though both Aarhus and Zagreb were whirlwind trips in a sense (flew in in the morning, and flew out the next morning). I’d be happy to revisit all of these places, and I think I will with quite high probability actually. All of these trips were for work though; I didn’t travel anywhere for personal recreation.

In terms of music I’ve been listening to, this quarter seemed to take on a relatively darker mood (though my Q1 review featured James Arthur’s Back from the Edge which isn’t exactly very bright either). There are parts of Lukas Graham’s 7 Years that I appreciate, including fair sections of the lyrical content. However I’ve always found the tone of the lead vocalist a little strange, and the production is awkward (there’s a part where the band is cheered, which doesn’t really make sense to me; I see the song more as one of reflection and acknowledgement of uncertainty, rather than of triumph). However, there was an acoustic piano cover of this that I enjoyed; Schuller’s voice is clean and pleasant, and I find the delivery more appropriate.

There is of course this part of the song that I identify with to some extent, though… it’s a little complicated.

I only see my goals, I don’t believe in failure

I like the ambition and drive, but it can be dangerous if said goals are poorly set. Resilience to failure is of course very important and something I’m still trying to get better at, but it’s important to respect failed attempts for what they are – and learn from them, where appropriate. I think the main lesson here is that it’s important to set good goals, especially if one intends to double down on them. (The second part is less contentious, as long as not believing is interpreted in the right way.)

Let’s push on in Q3.