Balloon Battle (Hotseat: UKIEPC 2017, Part 1)

Overview

The United Kingdom and Ireland Programming Contest (UKIEPC) 2017 took place a few weeks ago. I used to participate semi-actively and had a rather annoyingly consistent habit of finishing in third place at Imperial. I haven’t done contest programming for quite a good amount of time, though some of my experience from interviewing candidates at Palantir might have helped.

Teams are given five hours to solve somewhere from eleven to thirteen problems. These problems involve submitting a program (the source will do) to an online judge, which typically runs said program against multiple test cases and evaluates the output of the user’s program in some way. This can entail simply checking strings, or alternatively checking if the given solution achieves some task (consider problem D in this set).

This time round I managed to figure out 9 problems, rather coincidentally with a total “penalty time” of 999 minutes. When teams are ranked, the number of problems solved is compared first; to break ties, teams are given penalty time, where solving a problem T minutes since the start of the contest, with F failed attempts, scores T + 20F penalty time. The penalty for failed attempts does not accrue until the relevant problem is actually solved. It’s still in one’s interest to finish solving any questions – even though it may give you arbitrary amounts of penalty time, it will not hurt your ranking. This score would have placed second at Imperial (maybe because some of the members of the usual team number 2 have graduated?) and, at time of writing, 51st out of 522 overall.

Problems

This half of my write-up (I intend to discuss this in two parts) covers what I think are the easier seven problems in the contest, in the order I tackled them. Looking at the metadata on Codeforces, there is a pretty clear drop-off after this. Interestingly, F was actually considered the hardest problem of the ones I’ve listed here (in that fewest teams solved it), but I was able to place it third in difficulty; I struggled more with D than some of the later ones, even, though mainly for implementation reasons.

C: Cued Up

Wrong answer at 9 minutes; correct answer at 18 minutes (10 minutes spent total).

Given a description of the number of snooker balls still on a table, determine the maximum number of possible points remaining to be scored. Initially any ball can be hit (it isn’t known whether the red or coloured balls are “on”), but thereafter players can’t pot 2 consecutive reds, and if a non-red is potted and reds remain on the table, the next ball potted must be red. Assume that only one ball is potted per shot.

This is mainly a straightforward implementation – the main snag is that if you have only red balls, then regardless of how many there are the right answer is 1. I think I messed up the first time because I fixed this edge case (which is the last of the sample tests), but accidentally ended up printing “1” before the correct answer on all of the other tests…

I: I Work All Day

Correct answer at 13 minutes (4 minutes spent).

This was basically decoding the problem statement and figuring out that “what’s left over after repeated subtraction” refers to the modulo operator. I probably could have got this one out faster if I was still comfortable with C++.

J: Just a Minim

Correct answer at 17 minutes (4 minutes spent).

Figure out the length of a tune given the length of note values. Again, raw implementation. I’m surprised this even took 4 minutes, though that might well have been because I went through all the sample tests, and also did the “hacky” approach of hardcoding the strings. There was a slightly faster to write up solution that involved special casing 0, and then writing 1/x for the others. I also might have wasted some time because I was concerned about floating point rounding errors and so calculated things in terms of sixteenths. (That’s not needed because these numbers have exact binary representations, and the problem limits would be very unlikely to result in a loss of precision.)

F: Flipping Coins

Correct answer at 26 minutes (8 minutes spent).

You have n coins, all initially tails up, and k coin flips which you are obliged to exercise – but you can choose which coin to flip. At the end of the game you keep all coins that are heads up. What’s the expected number of coins you can keep (with optimal play)?

This one is figuring out the right strategy (pretty simple – always flip something that’s tail side up if you can) and then writing a recurrence relation. There are overlapping subproblems, so a quick bit of memoisation is needed, but it is not hard to implement. I’m surprised how much candidates appeared to struggle with this problem. I guess it requires a little bit more familiarity with thinking in terms of probabilities, which has tended to be one of my stronger points at contests like these.

D: Deranging Hat

Wrong answers at 50, 53, 61 minutes. Correct answer at 89 minutes (63 minutes total spent).

A sorting network consists of comparators; given two indices i and j (importantly, in that order), if i is greater than j, then the letters at those indices are swapped. Given a string, build the sorting network that would “sort” the traditionally sorted version of that string back into the original string. For example, given the string “dude”, you would need to transform “ddeu” to “dude”, perhaps by comparing 4 and 2 and swapping them, and then 3 and 4 (and then swapping them). The sorting network can’t be more than ~10X the length of the string, in the worst case.

It took me quite a while to figure out what the problem statement meant. I came up with a reasonable idea (swap each letter into its right place, iterating through the string once), but was needlessly worried about performance and started doing a lot of fiddly book-keeping with Maps of SortedSets; there was a difficult-to-trace bug in there. I then gave up some time after 61 minutes and reimplemented things with a much simpler temporary char-array solution, which worked.

A: Alien Sunset

Correct answer at 104 minutes (15 minutes spent).

Given day lengths on multiple planets and times of sunrise/sunset, find the first time within a given time window where it is dark everywhere.

I avoided this at first because it looked like an Interval Tree kind of problem. Though seeing that many people solved it, I had a closer look and it looked like writing something in O(planets * allowed time window) would work. Thus, simply start from 1 and test each possible time interval to see if it is indeed dark everywhere, and if it is return. There was a bit of a snag with some planets having daylight hours crossing zero, but nothing too difficult to manage.

E: Education

Correct answer at 140 minutes (36 minutes spent).

A school has N departments, each with some number of students Si; they need to move to a new set of buildings, which each have capacity Ti and cost Ci (but they cannot share buildings). Minimise the total cost of all buildings, while fitting all of the students in (if possible).

I figured to sort the departments in descending order of size and then thought of some kind of dynamic programming based solution. I then realised I could do better; a greedy solution which picked the cheapest available building could actually work, if one was considering departments in descending order of size. This led rather quickly to one possible solution: sort the departments in descending order of size, and then populate a priority queue (highest priority = lowest cost) with permissible buildings. As larger departments are housed, smaller buildings can be added to the permissible pool, though we still want to ensure that large but cheap buildings would be used first. If the queue ever ran dry, the assignment would be impossible.

Five problems remain that are somewhat more difficult (at least in my opinion).

Thoughts

There are certain skills that I’m still reasonably comfortable with, like mathematics in terms of computation. There are others like computational geometry that I don’t remember too well (as we’ll see in part 2). The first part of these contests is typically a speedy rush to get credit for as many problems as possible; usually somewhere between the 5 and 7 problem mark things slow down a bit. The second part for me tends to be more interesting, though I find that the focus changes from speed to actually thinking hard about algorithms – and very often fretting over implementation as well.

Unwinding Logical Threads (Hotseat: WPF Puzzle GP 2017, Round 1C)

Background

The World Puzzle Federation is an international organisation which, among other things, organises annual puzzle contests. These involve solving puzzles that rely on logical deduction and observation. That said, maybe because I’m pretty new to these, sometimes backtracking algorithms are the order of the day!

The Puzzle GP is a series of distributed puzzle competitions run throughout the year. The GP is split into three series: C, B and A (in increasing order of difficulty). Puzzles in group C are likely to be largely accessible to general audiences (and that’s what I’m covering today); group B puzzles are typically drawn from a selection of “well-known” puzzle types, and group A puzzles often feature extensions of group B puzzles in unusual ways.

Contestants are given 60 minutes for each set of puzzles. I have very little experience with these puzzles, so I think I’ll start with group C.

(The puzzle set can be found here.)

Performance

I started off with the Darts puzzles, for which I had a strategy in mind. I then struggled with the Matches puzzles, only solving C7, before moving on to the Arithmetic Squares, both of which were solved quickly. I then spent probably a good ten minutes digging through the Word Search. I finished the easy Scrabble task and almost finished the harder one, though I failed to see a simple move near the end of the puzzle. I then started on Products, which I probably should have prioritised more highly as I think I figured out the mathematical logic behind it. I solved the 120-pointer C18 after the contest, and it was actually not too hard. With three minutes left to go after finishing C17, I took a stab at the Fill in the Blank series and managed to solve two of them.

I solved the remaining puzzles after the time ran out, apart from C3. Elastic Bands was actually very easy. I have a bit of an aversion to spatial reasoning puzzles, but recognised the problem as graph isomorphism and had some techniques for it based on that. Old Maid was also easy; I didn’t get round to it in time. The last Fill in the Blank and both instances of Count the Shapes both gave me headaches.

Selected Problems in Depth

C5-C6: Darts

Given a dartboard which has several regions with k point values and m darts, find a way to throw the darts on the dartboard such that the darts add up to a target value. Each dart must score, and each region can have only up to 1 dart. For example, with regions [4, 21, 8, 11, 13], two darts and a target of 19, the correct answer is [8, 11].

To solve this for two darts, one good solution could be to iterate through the list. On a computer, using a hash-set could be a good solution; when we look at a number, if it’s in our “target set” we’re done – otherwise we add the number we would need to make up the target to the target set. It’s a simple O(n) solution.

I think implementing a hash set on paper for me would be too troublesome. There’s an alternative which I used: sort the array in O(n \log n) and then, use a pair of pointers, one starting at the beginning and one at the end of the array. Adding the numbers up, if we’re over the target we bring the pointer at the end back, and if we’re under the target we push the pointer at the start forward.

The problem had three darts; I simply picked the overall largest number and solved the appropriately reduced target with the two-dart algorithm above, backtracking where needed to smaller largest numbers. Binary insertion sort is probably pretty reasonable to do by hand.

C10: Elastic Bands

Given a network with nodes marked with letters, and another network with edges provided, give a labelling of the nodes such that the networks are identical. This is basically finding what’s called a graph isomorphism in discrete mathematics.

I didn’t solve this in the contest, but probably should have. I split the nodes by degree, and then tried to identify relationships between these. For example, there was a single degree 4 node in the puzzle that wasn’t connected to any of the other degree 4 nodes, so I was able to uniquely identify it quickly.

C16-C18: Products

Given an N \times N grid, place the numbers from 1, 2, \ldots, 2N in the grid once each. Each row and column must contain two numbers, and numbers cannot share a cell. Clue numbers are given by the side of the grid and sometimes on diagonals. Where given, the product of numbers in the relevant row, column or diagonal must equal the clue.

The first two puzzles had N=10; the third had N=15. I think my main strategy here involved searching for prime factors that would uniquely identify numbers along the rows and columns. For N=10 this would be [11, 13, 17, 19]; for N=15 this included [17, 19, 23, 29]. I also usually started looking for very large numbers (because they might have a unique factorization in the allowed range) or small ones (e.g. in the first puzzle, there was only one column and one row with a product less than or equal to 20, implying that the 1 had to be in that column/row).

I also considered the places where a given number might fit, and in the later puzzles noticed when multiple columns could definitively use certain numbers. For example, if 5 was already used in an N=10 puzzle, and there were two 20 row hints, I could eliminate 1, 20, 2 and 10 from consideration for the other rows.

This wasn’t too difficult, though I can imagine that puzzles with only partial hints might be substantially more challenging!

Meta-Analysis

A score of 262 would have placed 12th out of 215 contestants (2nd out of 139 for contestants not in higher divisions). To be fair, this is in part because many top solvers don’t participate in the division C contests at all. Some of this might also be because there would have been less pressure as I was doing this on my own, not as part of a real contest.

Looking at the top end of the scoreboard, the Darts, Arithmetic Square, first instance of Matches and first instance of Products were solved by most of the leaders. Elastic Bands and Old Maid (I didn’t solve either of these two), as well as the Word Search were also fairly common. Thereafter, contestants varied quite a bit in what they solved. As might be expected, the contestants that solved the final instance of Products generally did very well (it was worth 120 points by itself).

Inflation and the Substitution Effect

I sometimes pick up my groceries at Iceland (the supermarket, not the country). When I was there today, I noticed a bunch of frozen pizzas which were typically on sale at £1 now on sale at 2 for £2.50. I thus bought zero of them, opting for a substitute (that turned out to be pretty good!). This is one example of a shortcoming of the generally accepted way of measuring inflation (via a consumer price index).

Inflation refers to an increase in general prices of goods and services (in numerical terms). With the devaluation of the pound after the Brexit vote, I’ve experienced imported inflation (because a pound buys fewer dollars/yen etc., and overseas suppliers of goods want to be paid in their currency).

Of course, one problem is how one measures “general prices of goods and services”. Typically, inflation is measured via changes in a consumer price index. This is an aggregate of the prices of a supposedly representative basket of goods and services. The basket is reviewed periodically, as some goods become more (or less) frequently consumed. For example, in the UK, this is managed by the Office for National Statistics. The 2016 basket is listed here, and changes included the addition of video game downloads and removal of nightclub entry.

Of course, individuals’ spending patterns can be very different from the “average”. One can imagine a personal inflation rate that could be very different from CPI. For example, I don’t drive, so a significant increase in the price of cars might not affect me very much. (There could be some knock-on effects, e.g. if the cost of minicab rides increases.) Similarly, an increase in the cost of recreational boats (which are part of the basket) would have little impact on me. Conversely, I consume a fair amount of potatoes, so a cost increase there could have an outsize impact on me…

It probably won’t, actually. If potatoes become expensive, I will eat fewer potatoes and more rice or noodles.

Something similar actually happened after the Brexit vote. I used to enjoy an occasional instant ramen treat from the Japan Centre in London. However, with the pound falling almost 20 percent against the yen in the wake of Brexit (from 160 to about 130), prices were increased significantly. In many cases, this exceeded the 25% cost increase, perhaps because they wanted to avoid further increases if the pound fell further. GBPJPY is now about 149, but the GBP prices haven’t moved, possibly for the same reason. In any case, I’ve switched to eating more of other forms of carbohydrates.

Note that the above doesn’t always work as goods actually need to have a reasonable substitute. Medical services, (possibly imputed) rent and education come to mind. There are opportunities for geographic arbitrage here (e.g. travelling to Brazil as a medical tourist, living in an RV or attending university in continental Europe respectively), but these tend to have higher barriers to entry.

This is the key intuition behind chained CPI; it aims to account for people consuming different goods and services as prices change. This is currently being considered in the US as part of Trump’s tax reform (and would, in the future, save the government money by reducing tax bracket adjustments – chained CPI would be lower than CPI).

Of course, computing chained CPI is a hard problem. For example, there was a parasite problem in salmon farms in early 2017, causing supply to fall. (Again, this was another instance where my consumption habits changed.) Is trout an acceptable substitute? Cod? Chicken? Tofu? Any kind of food? I’d answer yes to the first and kind-of to the second. For the last three, that would probably only be the case in emergency circumstances. There were similar problems with an iceberg lettuce shortage in early 2017, and yet people continued to buy them in spite of prices almost tripling.

Furthermore, one typically only gets to witness substitution effects after the fact. If the price of lettuce triples, it’s difficult to predict how many people would switch or partially switch to substitutes. In practice, statistical offices usually come up with a preliminary estimate and then revise it when they have more data available.

As an individual who has responded to the substitution effect quite a few times, especially since I moved to the UK, I think chained CPI makes sense. For me at least, the substitution effect is real and powerful. It hasn’t received the friendliest of receptions in the US, at least in part for political reasons (it hits the poor and elderly hard, for various reasons). It does make sense to have a different metric to use for benefits or the state pension, perhaps more tailored to the relevant populations (should video game downloads, generally speaking, really go into an elderly CPI?). However, in general the chained CPI seems more accurate and I see no reason not to use it.

Quantitative Challenges

I spent about two to three hours studying and then working through mathematical problems on quantitative finance today. Specifically, these were questions from Blyth’s An Introduction to Quantitative Finance and dealt with interest rate swaps. These are contracts where one party typically pays fixed payments (e.g. 5%) and receives floating payments, which are dependent on market rates (e.g. LIBOR + 1.5%), though float-float swaps exist too (e.g. across currencies). Swaps can be used to mitigate interest rate risk (or gain exposure!); they can also be mutually beneficial depending on companies’ borrowing characteristics.

If I had to classify the mathematics involved in the problems I did, beyond “applied” I’m not sure how else I could label it. For many of these problems, the first steps involved figuring out how to mathematically model the financial products involved. There then followed some elementary algebra, along with proofs that required some intuition to pick the right approach (for lack of a better word). The exercises I did this time around relied less on probability than normal; much of this probably stemmed from a result that forward interest rates (i.e. the interest rate you’d get from future time T1 to later future time T2) could be valued independent of the distribution of possible values.

I’ve struggled quite a fair bit with the book, both in terms of the reading material as well as the exercises. Today’s chapter was relatively easier, though that might have been because I was reading through the chapter for the second time. It was my first time doing many of the exercises, though it seems like they went relatively smoothly today.

Some of this might be because I work through the chapters at a very relaxed clip of about one per month. Like many other mathematical domains, there tend to be many dependencies on previous topics. The earlier result I mentioned on forward interest rates, for example, was from the previous chapter; yet, it was instrumental in computing the valuation of a swap. There are certain fundamental ideas that I learned way back in 422 (Computational Finance at Imperial). I also think my mathematical knowledge and logical intuition have (hopefully) mostly stayed with me. Furthermore, I like to think that I remember the concepts at a high level. However, many proofs require recognizing that expressions are in certain forms and can thus be rewritten; I’m still yet to develop that level of familiarity – or shall I say intimacy – with the content.

This might also be partially self-created, especially where the reading material is concerned. When I see theorems, I tend to try my hand at proving them on my own first. These often prove to be rather tricky endeavors; the aforementioned lack of keen familiarity with the material certainly doesn’t help. Typically, I can understand the proofs fairly easily when reading them. However, I usually expect myself to figure out the intuition behind the proof (including reproducing it, at least at a high level), which isn’t always so forthcoming. That actually reminds me of what I used to do at Imperial for certain modules, especially the (in my opinion) two hardest of the course: 438 Complexity and 493 Intelligent Data and Probabilistic Inference. I would make the effort to understand why many of the proofs in those courses worked. I’d also try to figure out how the author might have come up with the proof, or at least what the core intuitions might have been. This included relatively nasty ones (e.g. SAT being NP-complete via direct argument, or ELBO results in variational inference), and I think it paid off in terms of understanding.

It could be argued that finding the material difficult is expected, because the subject matter is itself complex. I tried to obtain a popular estimate of the complexity of the material covered by the book, but didn’t find much data; there were only a handful of reviews on Amazon, which offered a wide spectrum of views (from “[o]ne of the best introductory treatise (sic)” to “I would hardly call it an “Introduction” to quantitative finance”). I’m not sure how to start more simply, though; there is a fair bit of assumed mathematical knowledge, but this is at least partly spelled out in the introduction.

While the book has proved challenging at times, I’ve not found it too hard to follow. Discussing the problems with a friend has also helped a fair bit, especially since the book doesn’t have solutions!

503 Service Unavailable

I had a somewhat nasty chest infection over last week. It seemed to reach its peak last Sunday. I think “incapacitation” describes what happened fairly well; I felt like I was consistently running headfirst into an imaginary wall. I slept for about 4 hours on Sunday afternoon, and another 12 at night; probably at least 11 if not 12 on Monday evening as well.

I think I recovered somewhat through the week, though not yet completely. It was enough to host a programming contest on Friday, and to sing (though not convincingly) on Friday evening as well. It has been quite a while since I had been unwell to this extent. I don’t think I’ve had anything of this scale since I started studying at Imperial 5 years ago.

It served as a good reminder of the importance of health, something which I tend to not really think about very much. The cough did get in the way of my sleep as well, so I spent a good amount of time thinking. These thoughts ranged over quite a number of issues, including goals, planning, work and logic. I’m sure some of these must have been influenced by said illness, but there were some actionable ideas and thoughts. I did think about two aspects of what I call the “dark side of compound interest”: (i) the possibility that one may not be around to enjoy the results, and (ii) the inclination to discount present purchases by a lot by thinking about their value in the far future. I don’t think my chest infection was that serious at all, but (i) certainly looks influenced by it.

The post for this week is much shorter than normal, as I haven’t been able to gather the mental resources to write much longer. This also isn’t one of those “if I had time, I would have written a shorter letter” scenarios, to be clear! I’ve been making an effort to produce somewhat more readable and simpler writing. I think it’s a sufficient status update.

The Time Value of Money

There is an apocryphal interview question I’ve come across several times:

Would you prefer to have a dollar today or a dollar one year from now?

I do technical interviews for software engineers. Hence, this isn’t a question I would typically ask candidates (even if at times I wish it was – it could be interesting to see how candidates react!). Although it naturally seems like it would fit in a financial context, it seems too easy to be used as a serious question.

Anyway, the answer I’d go for is “today”, because I could take the dollar and put it in the bank to earn interest. In practice, I’d invest the dollar. Furthermore, inflation is more likely to be positive than not, and this eats away at the value of the dollar. The idea that getting the dollar now is better is known as the time value of money. That said, I can also see legitimate cases why one might argue for “one year from now” – mainly centering around the idea that custody of the dollar is taken care of (assuming we allow this to be assumed!).

Conversely, if you asked me a slightly different question:

Would you prefer to have a dollar today or $100 one year from now?

I would probably go for the hundred dollars, because my investments are very unlikely to increase hundred-fold (unless we have hyperinflation) in a year. As before, there are legitimate cases why one might go against the grain of financial theory – cash flow issues, in particular.

If the amount is reduced a fair bit, such as to $1.09 (for me at least), then the decision gets more difficult. Using some kind of intermediate value theorem, there should be some value of r for which I’m indifferent to this question:

Would you prefer to have a dollar today or $(1 + r) of today’s dollars one year from now?

The conventional theory here is that if I got the dollar today, invested it for a year, and then have (1 + r) of today’s dollars, then I should be indifferent. I’m not sure I agree in practice. This is mainly because of the aforementioned cash flow issues. If 6 months on I find that I need the dollar, I can take it out and still keep the partial returns. You would need to give me an illiquidity premium. (Of course, I’ve assumed here that I invest in liquid securities.)

There is also another shortcoming of this question. The size of the capital relative to my net worth would also affect my answer. Rather interestingly, I think I would take the money early for small or large amounts, but consider waiting for medium-sized ones.

For small amounts, I would need to remember that a capital inflow is coming in a year’s time. The cost of tracking this could exceed the “premium” I derive from waiting. Conversely, for massive amounts, we start delving into the realm of diminishing marginal utility – if I could pick between $1 trillion today and $1.1 trillion this time next year, I’m pretty sure I’d pick the former.

Up to this point, we’ve also avoided what’s known as counterparty risk. The person offering the money might become insolvent within the year. This would bias people towards taking the money now, and is reminiscent of a well-known proverb (“a bird in the hand is worth two in the bush”).

Nonetheless, this practice of time discounting is useful when trying to assess the value of investments or securities, such as annuities or structured products. It is also frequently used in discounted cash-flow analyses, which are useful for determining if business ventures are likely to be profitable. I have not had to put this skill into practice yet, though (well, apart from the Computational Finance exam I did at Imperial).

In theory, these concepts should be applicable to other resources or assets which (1) appreciate over time, and (2) can accumulate in value without substantial effort. That said, I’ve struggled to think of assets outside of the standard “investment” universe (stocks, bonds, real-estate, commodities, private equity, collectibles?) that satisfy both criteria.

I did think of social capital (i.e. friendships, reputations) and human capital (for me, software development and other skills). They don’t seem to satisfy (2), though it could be argued that (2) is too strict. For example, by going about my daily routine, I already (hopefully) absorb more and better dev practices. Similarly, one needs to (well, should) do one’s homework regarding asset allocation and understanding one’s investments. Also, in practice to maintain one’s asset allocation one needs to rebalance a portfolio.

Road to Somewhere (2017 Q3 Review)

Q3 felt very much like a quarter of consolidation, as opposed to development or novelty. The two main highlights of the quarter for me were IJCAI17 and a rather convoluted hack week project at Palantir that reminded me of the applicability of raw CS theory to daily work (of course, getting an award for technical achievement was great as well).

On the coding front, there was a goal last quarter to merge 30 pull requests on AtlasDB. I hadn’t been tracking this, but it turns out that this target was actually hit on the nose. I’m somewhat surprised, as I was out of the office for two weeks, and a fair bit of my time was spent looking at other projects as well. It is very tempting to set the next goal at 31, and I think I’ll bite – though given a likely winter break and some other leave that I do want to take, I’m somewhat doubtful that I’ll hit 31 in Q4. I also had a good conversation with my lead on possible next steps for growth; I can’t say too much other than that there is consensus, and I’m excited.

Turning towards an academic focus, the elephant in the room is IJCAI 17. I was initially skeptical I would enjoy it. I initially imagined that I would be unable to follow the presenters as they eagerly pushed the boundaries of our AI knowledge. Furthermore, the amount of money I spent on it in total was a very solid chunk of the quarter’s expenditure. However, it turns out these concerns were unfounded. There were some talks that I wasn’t able to follow, likely owing to a lack of familiarity in the relevant field, but many of them were very understandable. In general I really enjoyed the experience; I was impressed by the intellectual depth and complexity of many of the presentations.

The LDLK talk itself also went better than expected; it was scheduled at 8.30 am on a Friday morning, but nonetheless many people showed up! I was happy to present in person; this is one of the few things from my time at Imperial that I can truly say I’m proud of. In general, I also enjoyed the social events, meeting Alessio and subsequently exploring a bit of Melbourne (granted, this does not have an academic focus).

Looking forward, MCMAS-Dynamic still has one more paper that can be wrung from it, this time on some of the later-stage results. The paper is well in the pipeline now, though writing it is proving more difficult than either of the previous papers. When I was writing the thesis, I focused more on getting the algorithm right and implementing it correctly than giving a full formal proof! Thus, the core algorithm remained unchanged, but I certainly waved my hands a fair bit regarding some of the more intricate details.

Financially, my portfolio pretty much went sideways over the period, though a late rally pushed it up a bit, leaving it at plus 0.9 percent (total return). In spite of surges in international stock markets, I think the pound recovered a little bit as well especially towards the end on news of a possible Bank of England rate rise. I also sold off a bunch of pounds when GBPUSD broke 1.35. We can see how this compares against a bunch of other investment options:

(N.B. I have positions in the Fidelity World Index and iShares Global Property Securities Tracker. Also I do have holdings in active funds, although I’ve been comparing with trackers for benchmarking purposes.)

Not too unreasonable. UK stocks look like they outperformed and REITs were pretty battered, hence I did worse than the Vanguard fund. Also, we can see the impact of the pound’s (minimal) recovery in some of the delta between the Fidelity and Vanguard total world funds.

Spending this quarter shot up significantly. The obvious rises were in travel and individual meals, which was to be expected given IJCAI. Clothing expenses fell significantly, though I think there might have been some self-control failures last quarter as the amount spent in Q2 looks absurd in hindsight. I’ve been keeping up the routine of walking to and from work, and this shows in transport expenditures as well. Last year’s mean transport expenditure was 81.82; this quarter’s was 18.80 (and the average for the year-to-date is 25.02). That said, I noticed that I’ve been wearing through shoes faster, but I haven’t weighed that cost in yet!

On the comms front generally things have been going well too, though there have been some minor lapses. I think a fair bit of this is not so much in that I’m not seeing my friends, but in terms of the side activities (reading groups) we used to do more actively and/or diligently. I guess being aware’s the first step here. Also, reconnecting with some friends from high school and from my UTA group has also been great.

I think the period was somewhat gentle overall, but in no way calm. I have a tendency to expect higher standards from myself if things seem easy-going, and this can be stressful (I know, it’s a bit of a perennial struggle for me). I’d toss my standard “A League of Their Own” quotes at it if I’m feeling energetic – good things are often meant to be hard – but for some reason I haven’t always had the energy this time around. Some of this might be due to mild illness (had a cold in Melbourne, now having a cough).

I’m not sure why, but Shawn Mendes’s “Mercy” has been a song I’ve been listening to a fair bit this quarter. The obvious interpretation is one where the speaker is singing to an unreciprocated love interest. However, Mendes has suggested an alternative (about mercy from the stress of his career). I independently came up with another variant; I see the speaker singing it to an idealized, but critical version of himself, which is a voice I have in my head at times. We have from the chorus:

Please have mercy on me, take it easy on my heart
Even though you don’t mean to hurt me, you keep tearing me apart
Would you please have mercy on me, I’m a puppet on your string
And even though you’ve got good intentions, I need you to set me free –

It’s kind of a spirit willing / flesh weak type of problem; there are many things I know I want to do that I think would be beneficial, but I find that my bandwidth is already pretty heavily taxed. “Why aren’t you finishing that feature yet?” “Why aren’t you getting your 8 hours of sleep?” “When are you going to get round to writing that paper?” It’s painful, and at times I need respite, but I still do appreciate this voice. It does often serve me well, even if it has to cry out multiple times with the same request before corrective action is taken.

Q4 will have the usual engineering, finance and friendships targets. I have relatively fewer plans for the period (going into Q3 I knew that IJCAI was a thing). I might need to consider and decide if there is anything additional I should pick up.

Readability and Performance

I find myself faced with tradeoffs between readability and performance fairly often. This happens both in terms of programming (where, apart from in some specific situations, the former usually isn’t too bad) and in English.

In programming, code that is readable is usually desirable. This often makes it easier to maintain, especially for other developers who subsequently need to understand it. For example, let’s suppose we have a Java hash map, and we need to return an Optional indicating the value with a key that matches a given Predicate, if it exists. However, if there are multiple keys matching the Predicate, we want to throw an exception (even if the values are the same).

I would probably write something like this:

However, in terms of performance we could clearly improve. Using limit does allow us to shortcircuit (so we don’t look at more elements than we need to). However, creating stream objects could still add significant overhead, especially if the predicate is simple.

The code below should accomplish the same function, but more quickly.

This is still fairly readable, in my opinion, though there is a bit more state to keep track of. However, if the predicate is costly, we could do even better by taking our first approach and parallelising it, essentially framing the problem as a map-reduction. (The end result is likely to be slightly less readable. It might be substantially less so for people unfamiliar with concurrency).

Of course, it is not always the case that readability and performance are negatively correlated. Optimising compilers implement many techniques that transform a program with relatively readable source code into one that is efficient for a machine. This is often true, though, when an entirely different (and usually more complex) algorithm and/or implementation is selected because of performance.

It goes without saying that readability is also subjective (tabs versus spaces is a starting point). Readability does not always imply maintainability as well. For example, deeply nested structures where each layer focuses on one specific functionality may be readable, but may prove less maintainable given that they may produce longer, more convoluted stack-traces when exceptions occur.

Defining “performance” in natural languages is tricky. We could consider performance to be efficiency of communication, assuming a perfect ability to comprehend. Well-written compilers implement the source language specification faithfully. They should thus be able to understand valid source, even if it’s not in a human readable form. This happens in practice; processes like JavaScript minification are commonplace.

Of course, English does not quite have a specification. Nonetheless, I think efficiency under perfect comprehension is probably a reasonably close analogue to software performance. For example, if I was asked to explain the logical argument rule of modus tollens, I would probably say something like:

Suppose we have if A, then B. Suppose not B. Then, we have not A (since if we had A, then we would have B).

That explanation would probably make sense to readers vaguely familiar with formal logic and/or philosophy. Again, I could conceivably further increase “efficiency”, by removing the bracketed part. I could even boil things down further:

If A, then B. Not B. Thus not A.

Of course, if the person I was talking to was unfamiliar with logic and, let’s say, a friend, I would proceed very differently. This would, in a sense, be prioritising readability:

It’s a way of making an argument in logic. The idea is that if we know that if A is true, then B is true, and we know B is false, then we can say that A must also be false. Let’s use a concrete example. Let’s say that if it rains, I’ll wear a hoodie, and I’m not wearing a hoodie. Deducing that it must not be raining would be an example of modus tollens.

That said, there is also plenty of subjectivity in terms of readability for natural languages. There have been many attempts to compute estimates or measures of readability. These often use formulae that accept word length and syllable distributions and/or lookup tables. However, many of these fail to account for, among other things, the level of abstraction in text. Going back to our three explanations of modus tollens, they have approximate Flesch-Kincaid grade levels of 0.4, -2.6 and 5.5 respectively. However, I’m fairly confident that most three or four year olds (grade -2.6) would not quite understand the concept of modus tollens after reading “if A then B, not B, thus not A”. Even if 11 to 12 year olds (grade 5.5) were split into groups and given the three explanations, I suspect that the third would still be the most likely to help the students understand the concept.

I actually find the second explanation the most readable, owing to domain knowledge. In just nine words, it gives me a precise and accurate description of what the logical rule is. It might be appropriate for a general audience. It probably would not be for an audience familiar with neither logic nor philosophy.

The first is good in that it explains why not A is reasonable. I could see myself appreciating it if the subject matter was somewhat more complex (e.g. results in modal and temporal logic).

The third features a concrete example, which is useful when learning. It also explicitly sets out some assumptions common in logic, such as “suppose X” meaning “suppose X is true“. However, I find it to be less readable, as I need to read a longer text to extract the main conclusion of the paragraph.

Earlier, I highlighted that readability and maintainability of software and source code are not one and the same. Let’s consider the notion of maintainability. If spoken (especially extemporaneously), I’m not sure this is necessary at all. If written, that could refer to the difficulty of proofreading or returning to the work when trying to revise it, perhaps?

In the software case, there are times when my hand has been forced and I’ve thus opted for a complicated, less readable implementation. Outside of word games like Taboo and exams or application processes with strict word limits, I don’t think I’ve had to for English. Thus, in general I would consider my target audience and aim to present it in what I expect would be the most efficient way possible, while still retaining the required readability.

What Does “Likely” Mean? (Estimative Probability)

I typically take some time out on Saturday mornings to do some reading. This often begins with developments in personal finance or computer science, though there’s a tendency to branch out from that; I generally don’t restrict myself from going off on tangents.

For some reason, this week I came across a Masters thesis discussing communication of probabilities, specifically in the intelligence domain. It seems that I found it via A Wealth of Common Sense, a blog concerning personal finance that I read occasionally; there was a link there to a Library of Economics blogpost discussing issues with mapping qualitative descriptions to quantitative probabilities. For example, if I say that something is highly likely to happen, I would be implying that I believe it would happen with probability at least N; however, the numerical value of N is uncertain, and differs among different people.

For me at least, N would be around 80 percent (and, incidentally, the author of that post agrees); that said, I can certainly envisage people assigning values of N as low as 0.6 or as high as 0.95. Note that the problem can also be two-tailed (e.g. about evenmightpossiblenot inconceivable). The LoE author’s son proposes a reasonable scheme, which is to have authors outline their own mapping in a glossary. This is probably (well, I mean P >=0.7) a good first step, though there are implementation challenges in terms of length as well as completeness.

It turns out that the concept of words of estimative probability is treated very seriously in the intelligence domain. It is understandably important, as briefs are typically prepared in natural language, and often need to be communicated to audiences that may not be entirely comfortable with mathematical notation. To quote a CIA officer:

Most consumers of intelligence aren’t particularly sophisticated when it comes to probabilistic analysis. They like words and pictures, too. My experience is that [they] prefer briefings that don’t center on numerical calculation. That’s not to say we can’t do it, but there’s really not that much demand for it.

Furthermore, deriving highly precise (though possibly not highly accurate) estimates for probabilities is almost certainly (*cough* I mean P <= 0.03) pointless, and is likely (P >= 0.7) damaging in that it tends to (P >= 0.6) give a false sense of security and accuracy when that does not actually exist.

The original proposal divides probabilities onto a seven-point scale (arguably five, as the ends are meant to reflect absolute certainties): certain, almost certain, probable, chances about even, probably not, almost certainly not, impossible. I think most people would agree that the above ordering is in decreasing order of probabilities. Of course, strictly adhering to the above labels would impart a degree of awkwardness to writing, and a group of variants for each level (such as highly doubtful for almost certainly not) soon developed.

Interestingly, the author also gives possible a fairly specific meaning; he defines it to mean “greater than zero and less than one” (which makes sense; of course, something always happening is certainly possible – but it seems pointless to not use the more precise word), but also adds the restriction that “no numerical odds (can) be assigned”. This seems like a useful construct, especially in the early stages of knowing things when uncertainty tends to be high; the other descriptive terms were conceived with uncertainty ranges of about 10% on each side.

The author of the Masters thesis also considers how words of estimative probability are used in other fields. I found the comparison to weather forecasting particularly interesting, as the author rightly points out that that is one field in which the general public is given numeric estimates (of the probability of precipitation). Weather forecasters typically add their own prose when reporting as well, which allowed some comparison. That said, a major difference is that in forecasting, these estimates can be derived with fair precision (and, as it turns out, accuracy) as they can (and, in the UK, do) originate from predictive models of atmospheric conditions. There seem to be standardised terms as far as the US National Weather Service is concerned, though I wasn’t able to find comparable guidance from the UK Met Office.

The clarity required from words of estimative probability depends on the consequences of miscommunication, as well. This is of course important in intelligence, with some claiming that there was miscommunication regarding warnings related to the September 11 terrorist attacks. Incorrectly reporting a weather forecast is almost certainly (ugh, P >= 0.95) less consequential, though people may make bad decisions concerning taking umbrellas when going out or hanging clothes out to dry. I can imagine contexts where this would also be very important (such as experimental trials in medicine or financial planning), though it seems for the most part some degree of ambiguity or even unknown disagreement is probably okay.

A Tour of Two Islands (IJCAI-17 Reflections)

Over the past two weeks, I took some time out from work and visited two islands, one substantially larger in land area than the other (and, incidentally, I am returning to another island). I’ve been to both at some point in the past; the larger one for vacation, and the smaller for study, work, visiting family and friends and for vacation as well. The latter is not too difficult to guess – it’s Singapore; the former is Australia.

Academic Program

I was in Melbourne for the twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17). Of course, the main highlight for me was presenting “Model Checking Multi-Agent Systems against LDLK Specifications“, a paper which built on parts of my Masters’ thesis on MCMAS-Dynamic. This was my first time presenting at an academic conference, though it’s not my first publication (that’s in AAMAS 17 on CTL*K); it was a little nerve-wracking, though I practiced the presentation a number of times beforehand and I think that helped a lot. Prof. Michael Wooldridge (the author of a commonly used textbook on multi-agent systems) even gave me a compliment that it was confidently presented. I’m not so sure about that, but I guess I have some private local state, and in any case deontic logic doesn’t typically have box p implies p as an axiom.

I also enjoyed the rest of the academic program as well. Apart from Alessio, I had the opportunity to meet up with several other academics whose names, at least, were very familiar. The keynote was delivered by Prof. Stuart Russell (one of the co-authors of the well-known Introduction to AI textbook), and discussed the concept of provably beneficial AI. Essentially, humans tend to be poor at specifying things (e.g. King Midas); many naive specifications of goals or objective functions naturally lead to agents enforcing self-preservation; more informally, “you can’t fetch the coffee if you’re dead”. The talk then discussed a way of formulating a problem such that it would always be a dominant strategy for an AI agent to allow itself to be switched off (as compared to not allowing itself to be switched off); see [Hadfield-Menell et al. 2016] (The Off-Switch Game) for more detail. The other invited speakers also had fairly interesting topics, discussing applications of AI in teaching, commerce and Texas hold-em poker.

I made it a point to attend as many paper presentation sessions as feasible. Expectedly, at a top conference like IJCAI there were quite a number of talks, especially in areas further removed from my work, that proved largely inscrutable. I think my presentation would also have been tricky to follow if one was not reasonably acquainted with temporal logic and/or multi-agent systems. Nonetheless, I gave my best effort in trying to follow the presentations, and there were certainly many interesting talks.

There was a session devoted rather specifically to the discussion of AI applied in the context of computer games. I found [Dann et al. 2017] (Real-Time Navigation in Classical Platform Games via Skill Reuse) to be particularly interesting; the key idea was having the agent acquire skills for small local movements, such as making a single jump to jump on top of a block three cells above the ground. The agent learns success probabilities for these skills, and uses a variant of Dijkstra with log-likelihood to find the “most likely” way to reach the goal safely (though there are some complications with re-planning if a step fails). The authors also demonstrated that the model performs better than the state of the art algorithm, which is based on A*, for maze-like levels. Other talks in the session also discussed level generation and physics engine inference (for platformers), RTS combat planning, narrative control in an educational RPG… and strategy derivation in Hex (this felt a bit out of place, but I guess they must have had exactly five papers that fit the theme well; sessions had either four or six presentations).

In terms of tutorials, I was unfortunately rather sleepy when I landed in Melbourne on Sunday. I did attend the tutorials on Monday, though; there was a very gently paced introduction to argumentation theory (the course Argumentation and Multi-Agent Systems which Prof. Francesca Toni teaches at Imperial would likely have covered this material, though she was on sabbatical when I was in year 4), and a less gently paced overview of heterogeneous learning (machine learning where some differences in the data sources can be exploited to yield better results).

Non-Academic Program

Stepping back from the academic side of IJCAI, I also enjoyed the social events at the conference, and I think they were run fairly well; that said, I acknowledge I don’t have a benchmark at all. The reception at the Melbourne Cricket Ground was very generous, especially in terms of dessert (chocolate tarts!), and the conference in the Docklands was certainly done to a high standard. It also introduced me to the “alternating drop” protocol, which is apparently common (but peculiar to) Australia. Essentially, given two choices of main course X and Y, instead of having guests indicate their preference, they are served XYXYXY… around the table (of course, those with dietary restrictions are excluded). I can see how this is logistically advantageous (you know beforehand that it will suffice to prepare just over N/2 of both options, given N guests), but I’m not sure I like the probability of not getting my preference…

I stayed at the “recommended” Crown Metropol which was certainly luxurious. The hotel is located south of the Yarra river in Melbourne, and is part of the Crown Entertainment Complex – among other things, this features a fairly large casino, several restaurants and luxury shopping (way out of my budget range). Nonetheless, it was a good base from which I could easily explore the city, especially on the few days I stayed on after IJCAI.

Post-Conference

I made it a point to leave a few days after the conference itself was over to explore the city, as I knew the conference days themselves were going to be fairly packed (8 am – 6.30 pm usually, running till 11 or so on the night of the Reception and Banquet). Like many work trips, the “official business” part of the trip was fairly draining – I perhaps should have seen this coming. I did manage to have a look around, though; I took a river cruise, walked through a few parts of the city and indulged in a bit of outlet shopping. I also sought out a local fast-food chain (Red Rooster, specialising in grilled chicken).

Given that I’d flown this far across the globe when going from London to Melbourne (it’s roughly 20 hours in the sky) I also decided to spend a few days in Singapore on the way back. For the most part I focused on spending time with family, though I also managed to find some time to flip through some of the IJCAI proceedings.

General Thoughts


It’s surprisingly difficult to leave work behind…

This is certainly the first time I’ve had a two-week break since I started full-time at Palantir. (I had some things I decided I would clear out last year around the Christmas and New Year period, so no long break for me then.) That said, rather unsurprisingly in hindsight attending and presenting at IJCAI was easily just as intellectually stimulating and demanding as my regular work, so it felt more like a one-week break. Still, it was quite a dramatic slowdown.

I think the break worked well. It came at a time just after I launched some fairly hefty deliverables; knowing that I was leaving for two weeks, I had been pushing somewhat hard to get things out before I left. I hadn’t been back home for 11 months. I was initially a little apprehensive about going off for two consecutive weeks, though the logistics of course made sense, and I don’t think I have major regrets, at least not yet!