Browse Month

July 2018

Algorithmic Problem-Solving: On Pigs and Poison

A teammate asked me a computer science / algorithm question over lunch. It took me somewhere between 30 to 40 minutes of thought and discussion with teammates to solve the problem, inclusive of writing about 20 lines of Java; I can see it being a kind of ‘brain teaser’ interview question (and thus generally best avoided). That said, genuinely not having heard this problem before, I think it was interesting to see the thought process I went through to get to the final answer.

You have 1,000 buckets of water, 1 of which is poisoned. It is known that the poison if ingested will kill a pig within 15 minutes. You have one hour to determine which bucket contains the poison. What is the minimum number of pigs that you need, and how would you do it? (Eventually, of course, you want to generalise your answer.)

After some initial “lateral thinking”-style questions (does the poison only kill pigs? is the death timing exact?), I determined that it was intended to be a genuine algorithms problem. As a computer scientist, I then thought of binary encodings. This gives a quick solution to the case where you have just 15 minutes to determine which bucket is poisoned – if you convert each number to a 10-bit binary number (so the first bucket is 0000000000, the second is 0000000001, and so on) and give the first bucket to no pigs, the second to pig number 10, the third to pig number 9, the fourth to pig numbers 9 and 10 and so on, the final state of the pigs 15 minutes later allows you to determine the poisoned bucket. If we used this method and only pigs 5 and 7 died, then the poisoned bucket has binary representation 0000101000, which corresponds to bucket number 40. In any case, we know the answer is bounded above by 10.

I then considered ways of extending this solution given that we can run four trials instead of one. I did consider partial experiments, but didn’t see a way this would be helpful. One quick possibility we came up with was to linearly split the buckets among a number of pigs to cut down the search space – though this risks losing one pig at each iteration. Still, we determined that 6 pigs would work:

  • On trial 1, you have 1000 buckets and 6 pigs – divide into 7 groups of at most 143.
  • On trial 2, you have 143 buckets and possibly 5 pigs – divide into 6 groups of at most 24.
  • On trial 3, you have 24 buckets and possibly 4 pigs – divide into 5 groups of at most 5.
  • On trial 4, you have 5 buckets and possibly 3 pigs – the binary method can distinguish 8 buckets.

Yet, this same method wouldn’t work with 5 pigs. We were ready to settle on 6 as the answer, but the question asker revealed that it was possible to do better than 6. The idea of imbalanced groups then came up – the ‘no risk’ group of buckets should be larger, because in that case we could guarantee that we would still have all our pigs if we needed to investigate that group.

I guess I haven’t done algorithm interviews for a good while, and it thus took me to this point before I thought of setting up a recurrence relation. Define W(n, p) as the maximum number of buckets distinguishable with n trials and p pigs remaining. I then set up this relation:

W(n, p) = \begin{cases} 2^p & n = 1 \\ 1 & p = 0 \\ W(n-1, p) + pW(n-1,p-1) & \text{otherwise} \end{cases}

The last case comes about because we can safely allocate a group of W(n-1, p) buckets to be given to no pigs, since with one fewer trial we can distinguish between them. For each of the pigs present, we need to prepare to continue the investigation with one fewer trial and one fewer pig, hence pW(n-1,p-1).

This is more efficient, but doesn’t quite get us there yet. I noted that p(4, 5) was somewhere in the 800s, which established that strategies based on risking one pig in every round apart from the last would be insufficient. Thinking a little more about it, I realised that in the extreme case, you could always have a special bucket each round that you gave to all pigs – if they all died that round, we would know that bucket was poisoned. However, you couldn’t have more than one of these per round – if they all died, you wouldn’t know which bucket was the cause.

More generally, you could split the groups of buckets into groups that are given to zero, one, two… up to p pigs, where p is the number of pigs still left. The groups given to smaller numbers of pigs could likely be larger, to account for the additional resources available in later rounds. The following recurrence relation then drops out:

W(n, p) = \begin{cases} 2^p & n = 1 \\ 1 & p = 0 \\ \sum_{i=0}^p \binom{p}{i} W(n-1, p-i) & \text{otherwise} \end{cases}

This actually has a closed form solution. I didn’t notice it when looking at the recurrence, though it became apparent when I coded up a program to calculate the required W(4, 5) = 3125. I was surprised to see how much additional discriminatory power five pigs had, though four wouldn’t be sufficient. Interestingly, five pigs and three periods were sufficient. The general closed form solution is simply W(n, p) = (n + 1)^p and our argument provides a method for actually identifying the relevant bucket.

Finally, an interesting challenge here was to establish that it wasn’t possible to achieve this with four pigs (so that 5 would be the minimum). One way to do this was to reason in terms of information theory – a pig either survives all the way or dies after a given round, and then it stays dead. This is n + 1 many possibilities, and there are p pigs, so it seems that we can’t do better than (n+1)^p. Clearly 5^4 = 625 < 1000 so four pigs isn’t enough; also, this expression matches the value obtained by our recurrence-based method, suggesting that that is indeed optimal.

Tactics and Strategy (2018 Q2 Review)

I generally interpret strategy to refer to thinking about broader goals and general approaches for achieving these goals; conversely I think of tactics as finer-grained methods for effectively completing smaller tasks (that hopefully contribute to fulfilling one’s strategic goals). As a software engineer who works with databases, strategic concerns could be “the database should be able to serve requests quickly”, or “you shouldn’t be able to shoot yourself in the foot too easily”; tactics could involve using B-trees or clever data structures, or being careful about what APIs one exposes respectively.

For individual endeavours and also at work, I often need to play both a strategist and a tactician. I’m responsible both for determining what to seek at a high level and for implementing the required steps. There’s some tension in devoting resources towards getting better at either of these; I’d say the skills are certainly distinct (though not independent; good strategic thinking requires knowledge of what may be tactically plausible, and one should implement suitable tactics to optimise for outcomes that one strategically favours).

Strategically things have been a bit foggy this quarter. To some end there has been progress in terms of general quality as a developer, at least by some metrics. Getting better technically is good, though it seems to have been more of a focus than I remember prioritising it to be.

I have to some extent been struggling to find satisfaction in things, as well (there was a very relevant series of sermons on contentment in church – but practical application is often much harder than theory). I think some of this was a change from working with a preponderance of ‘shiny’ features leading to patent applications or other forms of explicit external recognition. I even forgot about some features that were clearly valued by other teams or people until they reminded me that I wrote them when discussing what I’d been working on in general!

Outside of work, also, I need to find strategic direction (there hasn’t been very much). With regard to computer science, I need to think where I go from here. AAMAS’18 in Stockholm is coming up and I’ll be presenting there about model checking LDLK on finite traces. This paper was very challenging to write; differently from the first two, I actually had to develop a substantial amount of novel content. I found myself guilty of excessive handwaving in the original proof presented in the thesis – there was a detail which I claimed to be trivial that turned out to require a full column of argument! I also have one more paper to write. I’ve also not been doing as much independent study of both computer science and software engineering as I would like.

In terms of finance, one of the good things about index investing is that there isn’t that much to do once the system is set up. There’s even less if you’ve automated some of the workflows (as I have)! After that, much return is subject to the random walk of the markets – with the (I’d say justified) belief, of course, that that walk trends upwards. The main ‘action’ required then is to stay the course, which requires patience and persistence.

Markets have been a bit of a mixed bag. Equities have gone up a lot apart from emerging markets, but a good chunk of this is likely to be currency effects as the pound fell hard. The usual table follows:

(Disclaimer: I hold the Fidelity World Index Acc, Vanguard LS80, iShares EM index, iShares property index, BTC, GBP and USD in various quantities, among other things.)

Spending has exploded (well, relatively) this quarter. Some of this is due to periodic expenses with a period greater than three months coming in in this quarter – renewing the servers and domain name for this website and brokerage fees. There is also some (I’d say permissible) lifestyle inflation too – frugality or cheapness to the detriment of daily happiness is usually not something worth doing unless one is forced into that position.

My budget also has a category called ‘Learning and Development’, which includes books for reading and fees for various educational activities (such as conferences, lessons, exams and the like). This swelled in Q2. I haven’t been reading substantially more. Most of this was the fees for AAMAS’18, FLoC’18 (more logic!) and music lessons.

I’ve made a conscious decision to spend more time outside of work on various recreational pursuits (even if the precise details of these aren’t clearly directed). I’ve gotten faster and better at solving logic puzzles and Sudoku; it’s difficult to come up with clear performance metrics, but I’m usually able to score in the 70th percentile or so in online contests, and can occasionally squeak into the 80s in broader contests (where the average skill level is slightly lower). I know that I was in the 20s or 30s when I first started trying these. I ranked 42 of 157 in the UK Sudoku Championship 2018, and less impressively 50 of 141 in the UK Puzzle Championship 2018.

I’ve continued writing here – a bit less frequently than I would like, admittedly. I also had my first vocal lesson in about three years or so – it’s good to be back. To some extent I strive for quality in what I do, even in recreation (I told my teacher “I enjoy sounding good” – I didn’t go as far as “I only enjoy it if I’m sounding good”, though that’s probably more true than I like to admit…)

I traditionally feature a song I’ve listened to a lot over the quarter in these reviews; I’ve done this for about five years now, and it can be interesting to see what I was listening to several years ago. In many cases I liked most of the songs then and I still do now; I tend to select quite heavily for a meaningful message or idea, and I like to think I’m quite resistant to faddish qualities in songs.

I first heard this song on a plane (like many others!) when flying from London to Singapore for a one-week vacation in May. I have listened to a couple of songs by the British boyband The Wanted before. One of their heavier pieces, Warzone (about leaving an abusive relationship) featured in my 2014 Q2 review, where I decided the 90- and 100-hour weeks I worked then were a bit too much of a cost. I also quite like Chasing the Sun, and a couple of their album-only tracks as well.

Unfortunately, the band dissolved a few years ago. They weren’t quite as successful as One Direction, so as far as I know only one member, Nathan Sykes, has continued with a solo career. I didn’t actually know this until I came across his new album Unfinished Business (yes, quite apt). The opening track is titled Good Things Come To Those Who Wait. It reminded me a fair bit of James Arthur’s Back From The Edge both aurally and thematically.

The second verse is relevant to some issues I’ve been thinking about this quarter:

Some people run and fall
They don’t even walk at all
There’s nothing to prove, just to feel you exist

It’s important to balance aggressive implementation (“running”) with careful evaluation of whether what one’s implementing is relevant to what one wants to do – especially if the path ahead is unclear. In some cases, it can be better to move more slowly and carefully.

The last line sounds weird, but in context is a reason why the speaker takes his time with things. The converse of feeling a need to prove oneself to others is understandable. I do fall into that trap sometimes, though I know it can be a dangerous thing to do.

More generally, the idea that good things come to those who wait can be contentious. I think it depends on what kind of waiting is involved – waiting to strike when opportunity arises is great (which I think is the point of the song – “I prefer to stick when the others would twist”), but idle, muddled waiting might not work so well.

As far as the album is concerned, I also found I Can’t Be Mad to be excellent. There were also a number of other good songs; I think it was a reasonably strong offering.