Browse Author

Jeremy Kong

Reactions to the Red Box (UK Budget 2018)

I’ve noticed that there has ended up being a string of primarily financial posts. This was not intentional, but there happened to be lots of interesting material related to finance that has come up over the past few weeks. Also, a disclaimer: I am not a financial advisor and am not offering professional financial advice. Conduct your own due diligence or seek an advisor if necessary.

The UK government announced its Budget for 2018/2019, though with the caveat that it might be subject to change in the event of a no-deal Brexit (which looks a bit more of a risk each day). I’m a UK taxpayer, and thus changes in relation to tax and relevant allowances interests me; tax is my largest expense every year. As an expatriate I don’t have recourse to public funds, so I’m less aware of the changes pertaining to benefits and universal credit. The list below is far from exhaustive; readers should consult a more detailed guide like MoneySavingExpert or even the Budget itself if they want more details, or this Which? article if they’re considering optimising their tax affairs.

Income and Take-Home Pay

  • Personal allowance has increased from £11,600 to £12,500 and the higher rate (40%) band has increased from £46,350 to £50,000.
  • National insurance thresholds have risen; the start of the 12% band has increased from £8,424 to £8,632 (by £208), while its end has increased from £46,834 to £50,024 (by £3,190).

I was aware that the Conservative manifesto had pledged to bump the personal allowance and higher rate thresholds to these levels by 2020, so this is one year early. These increases seemed fairly generous – although I will be paying more in NI (notice that £2,982 is now taxed an extra 10 percent and £208 is now taxed 2 fewer percent – in other words I’m paying £294 more) the income tax saving looks pretty good, especially in nominal terms. (Whether Brexit wrecks the value of the pound further is another separate concern.)

A separate consequence of this is that the 62% marginal band has extended to £125,000 as well, as there’s more personal allowance to lose. Regardless, the changes are certainly positive.

In general, with inflation being positive, the value of a pound decreases over time. Thus, thresholds should be increased nominally at least in line with inflation if the goal is to implement a ‘neutral’ policy. I haven’t been keeping track of the history of these levels, but the roughly 7.75% bump in the thresholds is certainly well above inflation (and hopefully above two years of inflation – since the thresholds will be frozen next year).

Other Income Sources

  • The Personal Savings Allowance remains at £1,000 for basic rate taxpayers, £500 for higher rate and £0 for additional rate.
  • The Dividend Allowance remains at £2,000 – note that dividends in a pension or ISA do not count against this limit. Dividend tax rates are unchanged.
  • The Trading and Property Allowances remain at £1,000.
  • The Rent-a-Room scheme threshold remains at £7,500.

Most things held pretty steady in nominal terms (which means they’ve gone down slightly in real terms). That said, if interest rates continue to go up the savings allowance thresholds might quickly become relevant to me (obligatory hello to Marcus). I’m considerably further from the dividend allowance threshold, though again that’s something to watch out for if the markets decide to be exuberant. I haven’t been using the other allowances, though if an opportunity comes up that could be a possibility.

Securities and Assets

  • The ISA (tax-advantaged savings account) limit remains at £20,000.
  • The Capital Gains Allowance was increased slightly, from £11,700 to £12,000. Tax on capital gains is unchanged.
  • The Lifetime Allowance for pensions was increased in line with inflation, from £1.03M to £1.055M.

Not too much to say here. The markets haven’t been so friendly so there’s certainly no extravaganza of crystallising significant capital gains this time around. To be fair, much of my gains in the 2016/17 tax year centered around massive pound weakness. ISA allowance being held where it is is a bit of a damp squib as I max it, but to be fair it is still very generous.

Consumption Taxes

  • VAT standard rate remains at 20%.
  • Air Passenger Duty (APD) remains flat for short-haul flights, but rises by RPI for long-haul flights (roughly £2 for Y, £4 for premium cabins).
  • Fuel duty is frozen.
  • Alcohol duty is frozen for beer, “most cider” and spirits. Duty on wine and higher-strength cider increases by RPI.
  • Tobacco duty increases by RPI + 2%. I don’t smoke, so don’t know too much.

Owing to how it’s calculated RPI tends to be higher – and it’s a little frustrating / naughty that most payments coming out of the Treasury (e.g. planned bumps in income tax/pension allowances) seem to be CPI indexed while those going in tend to be RPI indexed. The main reason suggested for the double standards is legacy technical debt, but that doesn’t seem to explain why the inconsistencies seem to consistently resolve in favour of the exchequer.

The APD change garners revenue for the Treasury, though it’s a little sad as APD is already incredibly high compared to aviation duties in other countries. That might actually increase the attractiveness of routing through Zurich or one of the German hubs when I fly the London-Singapore route.

For comparison, long-haul premium cabin APD ex-UK will be £172. Singapore’s duty is about S$47 (about £27); Zurich weighs in at CHF 35 (also about £27) and Munich at EUR 42 (about £37).

I guess I’m mildly affected by the alcohol duty change as I do drink wine, but I don’t drink a lot so the impact is very minimal. I’m not currently affected by the fuel or tobacco duty changes.

50p Coin

  • The Royal Mint will produce a 50p coin to mark Brexit.

I’m somewhat of a Brexit bear and remain one (pun not intended). Taken at face value, the inscription (“peace, prosperity and friendship with all nations”) is at least one way in which I could see it possibly having a shot at working out – and assuming Brexit does go ahead is what I hope the government will do.

However, the quote is adapted from Thomas Jefferson’s inauguration speech as US President. The continuation is “entangling alliances with none”, which is in some ways apt if the UK is concerned with the EU’s ever closer union – though I’d think the US and UK in a modern-day context certainly are allies!

Conclusion

In terms of personal impact, the Budget felt very much like a constructive or at least benign continuation of previous Budgets. That said, I realise I say that from a point of privilege in that I view the status quo as manageable. I’m aware that there have been changes to benefits (notably, universal credit) which have made some worse off.

The Problem of the Golden Basket

Somewhat related to the previous post on paydays, I had lunch and then coffee with another friend, and for some reason our discussion turned to personal finance as well. Unlike the last time, I was the one starting with the view that deviates from conventional theory this time.

Suppose you have a choice between two savings accounts. One account pays 2% AER and the other pays 1% AER – so £10000 invested for a year would earn £200 in the first account but only £100 in the second. Should you allocate 100% of your savings to the account that pays 2% AER?

In general, if all other things were held equal, there probably isn’t much of a reason not to allocate everything to the 2% account. However, the standard for all other things has to be high. In practice, I can see quite a few scenarios where there may be legitimate reasons to allocate some money to the 1% account.

One possible reason could be terms of access. Clearly, if the 2% account is a fixed rate bond only allowing access to the money after some amount of time while the 1% is an easy-access account, that provides a clear reason. If one wishes to maintain an emergency fund, for example, the fixed rate bond is probably best avoided even if it pays higher interest. Some savings accounts, while not having a fixed term, place restrictions on withdrawals in terms of frequency or advance notice in exchange for higher rates – again, be careful about using these accounts to park an emergency fund.

Another reason could involve how the accounts compound. The annual equivalent rate (AER) refers to how much money one will have after a year. However, if one wants the funds together with some interest before one full year, then precisely how the accounts pay interest becomes significant. If the 1% account compounds monthly while the 2% account compounds annually only, then between one month and one year after the start date the 1% account has more withdrawable interest. This is a variant of the access problem, though this focuses on access to interest as opposed to the principal. This may seem a little short-term minded, but could be interesting if one is engaging in stoozing or has other pressing financial commitments.

The amount of money one has is also relevant. Financial institutions can fail; in this case, the UK’s Financial Services Compensation Scheme (FSCS) guarantees up to £85,000 per depositor per authorised bank/building society. There is thus certainly a case for keeping not more than that amount with each bank; if one was fortunate enough for one’s savings to exceed £170,000, finding a third bank seems reasonable. I’ve never seen cash as doing the heavy lifting as far as growing my portfolio was concerned. I’d collect interest as available, but would prioritise safety. If the accounts were held with the same provider, of course, then this argument falls down – even if one has multiple accounts, the FSCS limit is on a per-bank basis. In fact, one has to be careful as some bank brands do share authorisations – meaning that an individual will only get the protection once even if several of these banks fail.

In general, the inconvenience that might be caused by failures is something worth considering as well. The FSCS compensation only applies if the bank is suitably authorised; even if one’s balance is fully covered by the FSCS, claims can take one to four weeks to process. I think I’d be much more comfortable having at least a nontrivial amount in a separate account (two to three months’ expenses, ideally) if possible.

Customer service is another factor. I’d probably prioritise that more significantly for more complex products – however, for a savings account it would still be useful.

Furthermore, there are other principles which individuals might find important. MoneySavingExpert on its savings account best buy tables has a section for highly-rated ‘ethical savings accounts’. The criteria Ethical Consumer (which MSE works with) include concerns like tax avoidance and funding of climate change, though I don’t necessarily agree with all of them (“excessive director’s remuneration”, in particular – if someone is that valuable it seems unethical to me to artificially depress their salary). Similarly, Islamic banking is necessary for adherents, since interest is forbidden in Islam.

To conclude, there are quite a number of reasons why one might not actually want to put 100% in the higher interest bearing account. Of course it makes sense ceteris paribus (if all other things were held equal), but that seems unlikely in practice. The standard for all other things being held equal here is high – the access conditions, compounding conventions and bank account provider all need to match (and that isn’t even an exhaustive list).

Moving Cash Flows

I met a friend for a meal on the weekend, and among other things my friend mentioned that at his company, there was an internal debate over whether payday should be moved forward. This wasn’t a debate on the financial ability or willingness of the company to do this, but was instead focused on individual workers’ preferences.

My initial reaction was a bit of surprise. I wondered why this was even a debate, as I believed the answer should almost always be yes. This reminded me of the standard time-value-of-money question that I wrote about just over a year ago; being paid the same amount of money slightly earlier, mathematically, seems like an outright win. The UK hasn’t had negative interest rates yet – and even in a place like Switzerland where bank rate is negative, this isn’t typically passed on to most depositors.

Cash flow might be a problem if one considers the dollar-today-or-more-tomorrow question; however, this shouldn’t be an issue in this set-up. Valid cash-flow scenarios remain valid even if payday is brought forward. In a sense, an early payday creates additional options; it shouldn’t invalidate any existing ones.

With a bit more discussion and thought, though, we found that there were indeed valid reasons as to why one might not want payday to be shifted forward.

First, although the mathematical argument makes sense, there are some edge cases around tax liability. If one’s salary is close to a marginal rate change and a payment is pushed across a tax year boundary, the amount of tax one pays might change (and can increase). 

Also, although we speak of interest as an upside, how much benefit an individual can actually realise may be significantly limited. Some bank accounts pay interest based on the lowest balance on any day in the month, meaning that being paid a few days early yields no benefit. Even if interest is based on the average daily balance, the upside is also in most cases small. For a concrete example, 2017 median UK post-tax earnings would be about £1,884.60 per month. If one was getting paid three days early and storing that into a high-interest current account yielding 5% APR, the additional interest wouldn’t be more than about 30 pence.

Moving away from purely numerical considerations, there are many other plausible reasons too. Clearly, departing from an existing routine may affect one’s own financial tracking. I find this alone to be a little flimsy (surely one’s tracker should be adaptable to variations arising from December and/or weekends?). That said, if one is unlikely to derive much benefit from the money coming early (and it seems like in most cases there indeed wouldn’t be much benefit), the change would likely seem unnecessary.

Another scenario could be if one has many bills or other payments paid by direct debit, and cannot or does not want to pay all of them. In that case, deciding precisely where the money goes could be significant – for example, if one is faced with a decision to lose fuel or premium TV in winter. This is probably not the right system to handle a situation like that, but if one wishes to only make some payments then an unexpected early payday could mess the schedule up. Somewhat related might be joint accounts in households where there are financial disputes.

Taking advantage of an early payday also requires self-control. Consider that if one is living paycheck-to-paycheck, while an early payday might ease financial pressure, it also means that the time to the next paycheck is longer than normal (unless that is also shifted forward). This needs to be dealt with accordingly.

If you’d ask me whether I’d like payday to be shifted forward, I’d almost certainly say yes. Our discussion went to a further hypothetical – would you take a 1% pay cut to have your entire salary for the year paid on January 1st?

From a mathematical point of view, you would be comparing a lump sum of 0.99N dollars paid now, or (for simplicity) twelve payments of N/12 dollars paid 1/12, 2/12, \ldots, 12/12=1 years from now. Assuming that you can earn interest of r% per month and using monthly compounding, after one year we have

Value_{\text{LumpSum}} = 0.99N (1+r)^{12}

Value_{\text{Normal}} = \sum_{i=1}^{12} \left( \frac{N}{12} (1+r)^{12 - i} \right)

If we set these two to be equal and solve for r, we get a break even point of r = 0.00155. This computes out to an APR of about 1.87\%. This is higher than best-buy easy access accounts at time of writing (MoneySavingExpert identifies Marcus at 1.5%). You can beat this with fixed-rate deposits, and probably beat this through P2P loans, REITs and equities – though more risk is involved.

I think I could see that being a yes for me, though I’m not entirely sure I’d have the self control required to manage it properly!

Chips on the First Floor

Every so often, I spend some time filling out a crossword puzzle. There are quick crosswords where only definitions of terms are given, and cryptic crosswords which include both a definition and some wordplay. Especially for quick crosswords, these clues can be ambiguous – for instance, Fruit (5) could be APPLE, MELON, LEMON, GUAVA or GRAPE; OLIVE if one wants to be technical, and YIELD if one wishes to be indirect.

To resolve this ambiguity, about half of the letters in a quick crossword are checked. This means that their cells are at the intersection of two words, and the corresponding letters must match.

With a recent puzzle I was attempting, I had a clue with a definition for ‘show impatience (5, 2, 3, 3)’. I didn’t get this immediately, but with a few crossing letters in the middle I quickly wrote down CHOMP AT THE BIT. This was fine until I had a down clue with definition ‘problem (7)’ which was D_L_M_O. This should clearly be DILEMMA. It was a cryptic crossword, so I was able to check CHAMP AT THE BIT with the wordplay part, and it made sense. (The clue was “show impatience in talk about politician with silly hat, I bet” – which is CHAT around (an MP and then an anagram of HAT I BET).) The “original” expression is actually CHAMP, though I’ve only heard of the CHOMP version before.

I sometimes have difficulty with crosswords in the UK (and sometimes with crosswords from the US as well) owing to regional variations in English. Singaporean English follows the UK in terms of spelling. However, in terms of definitions, things vary. For example:

  • Common with UK usage:
    • Tuition refers to additional small-group classes (like in the UK), not the fees one might pay at university (US).
    • biscuit is a baked good that’s usually sweet (like in the UK) and probably shouldn’t be eaten with gravy; an American biscuit is a bit more scone-like.
  • Common with US usage:
    • Chips are thin fried slices of potato (same as US). The word refers to fried strips of potato in the UK (which themselves are fries in both Singapore and the US); the thin slices are called crisps in the UK.
    • The first floor of a building is the ground floor (same as US); in the UK that’s the first floor above ground (which is the second floor in Singapore and the US).

Without venturing into Singlish (which incorporates terms from Chinese, Hokkien, Malay and several other languages), there are also terms that aren’t in common with either American or British English. Some of these pertain to local entities. Economy rice is a type of food served in food courts, and the MRT is Singapore’s subway network – though I’ve heard several uses of it as a generic term, much like Xerox for copying.

Others seem a little more random. Sports shoes refer to trainers specifically, and don’t refer to water shoes or hiking boots which are used for sport. The five Cs refer to cash, cars, credit cards, country club memberships and condominiums – five things starting with the letter C that materialistic Singaporeans often chase.

I’ve been resident in the UK for around six years now. This is obviously fewer than the number I’ve spent in Singapore (about 21), though the years in the UK are more recent. I’ve gotten used to the British expressions, especially for life in the UK (I generally like chunky chips more than crisps, and correctly distinguishing the first and ground floors is important for getting around). I don’t think I’ve had too many issues with remembering the correct versions of terms to use when in Singapore or in the US – having had to deal with these inconsistencies has helped here.

That One Thing (2018 Q3 Review)

Monomania refers to a mental disorder where one pathologically focuses on a single thing (naturally, at the expense of others, as one’s attention is limited). To some extent, I may have experienced this in my second year at Imperial, where for some reason I became fixated on my academic performance. I remember studying almost 100 hours per week when preparing for my exams then (typically achieved by a very strict 9am-11pm schedule everyday for about five to six weeks). I’m not sure how I managed at the time. In terms of grades, I think one should aim at 60 + epsilon, 70 + epsilon or 100, and I chose the last of these.

Things have changed since. I now evaluate how things have changed over a variety of facets. Even professionally, the metrics are a bit more multi-dimensional now. I’m not sure if it was the Palantir internship or something else, but in fourth year I made some decisions that clearly weren’t optimal as far as academic performance was concerned (working 15-20 hours a week). At that time, my main goals were get an average of 90% and submit twenty pull requests as a part-time software engineer. After I graduated the metrics became writing papers and pull requests; the latter quickly changed to patents as, all other things held equal, I find developing two or three innovative systems far more interesting and probably more meaningful than fifty rudimentary changes.

In terms of development, I’ve certainly found my recent work to be challenging (which is a good thing!). I’ve been continuing my focus on performance work, though now that many easy wins have been taken the features are getting more complex. I’m also starting to have more clarity on the kind of work that motivates me (and discussing this with my lead and others); having such discussions feels relevant and good.

I presented the LDLK on finite traces paper at AAMAS’18 (and it seems I now have a h-index of 2). It’s unlikely there’ll be another conference paper for a while though, as I’m currently diverting my computer-science energies towards writing a journal paper which summarises and extends the three conference papers to some extent.

A highlight of the quarter was receiving the VCLA award for an outstanding Masters’ thesis as well (Imperial article). In terms of raw computer science difficulty, MCMAS-Dynamic is probably the most complicated project I’ve delivered (there are a couple I’ve done since that I’m also proud of, but I’m pretty sure MCMAS still wins). Some recognition is always nice; too much ego-stroking can be unhealthy, but the odd bit is enjoyable.

In terms of finance, I didn’t really pay much attention to things this quarter. I was aware of the Turkish and Argentinian currency troubles as well as the US-China trade spat, but things on my end are still largely business-as-usual. In fact I thought this quarter was not going to be good, but looking at things it’s largely an effect of recency bias; while performance in September was indeed weak, July and August were actually very good.

Disclaimer: I hold my own portfolio, which includes the Fidelity index, VLS 80, iShares emerging markets fund, the REIT, the inflation-linked gilts fund, BTC and USD.

The high expenses of Q2 have actually snowballed further in Q3, though most of this is explainable by six-month periodic expenses and booking travel for the end of the year. Discretionary expenses have actually fallen a fair amount, which is unsurprising as Q2 expenses included Stockholm travel as well as renewing my web server. Savings rate is down, though still on track to be in a good place this year.

Work on dealing with logical puzzles has continued, and I think I’ve managed to find a source of metrics; Logic Masters India has a good selection of past contests along with a rating system. Based on a few of these, my Puzzle skills are currently around the low to middle 600s, while my Sudoku skills are in the high 700s. Individual contests may vary in difficulty, so normalisation is used. Interestingly, this is done on two axes, comparing both one’s score relative to the top scorer and one’s ranking relative to all participants.

I’ve also started to take an interest in cryptic crosswords, though I’m certainly nowhere as competitive at these; my general knowledge is not sufficiently broad. “Logical puzzles” are designed with the intention of being culture-neutral; these often demand a wide vocabulary (which I think I have) and broad general knowledge (less so). It helps that there is a group at Palantir that attempts these recreationally; I’m not aware of a similar group for general logic puzzles.

When one is investigating a search problem, one typically needs to balance exploration (searching prospectively for additional solutions) and exploitation (locally refining good known solutions). Historically when managing my own decisions I’ve largely been pushing exploitation for quite some time. I first discovered an interest in mathematics when I was around six; while this isn’t quite computer science or software engineering, that was near-by. I started programming at 15, and when choosing my degree looked at CS which I felt balanced my interests, skills and prospects well. When I was at university, I had the aforementioned monomaniac episode; since then I’ve branched out a little.

I’ve been behind the times (the song was released in 2017, apparently), but I’ve been listening to Meant to Be (Bebe Rexha and Florida Georgia Line) a fair bit this quarter. I don’t particularly enjoy the vocals in the original, but it’s been covered well; this one is better polished, while this one has the approach I go for when I try to sing it!

It’s enjoyable for me to (try to) sing along to, including the high vocal line in the chorus; it quickly becomes a question of hitting the tenor Bb4 repeatedly. Listening to the song certainly does make me thing of search space exploration problems, specifically genetic/evolutionary algorithms. The lyrics in the chorus sound very much like evaluating a fitness function on a candidate solution and then deciding whether to accept it:

So won’t you ride with me, ride with me?
See where this thing goes?
If it’s meant to be, it’ll be, it’ll be; baby if it’s meant to be.

The first verse has elements of not being too pressured with exploitation as well, and thematically echoes bits of Good Things Come To Those Who Wait which I looked at last quarter. The second verse reflects a degree of caution after being burned by poor approaches in the past. As I plan for what I might want to try to achieve (God willing) on a timescale of months or years, it’s important to not index too heavily on past experiences or initial reads of how a given path may turn out.

Algorithmic Modelling – Alphabear 2

Alphabear 2 is a word game where players are given a two-dimensional grid of tiles. Some of these tiles show letters, along with timers; others are concealed and must first be exposed by using an adjacent letter in a word. Players use letter tiles (anywhere on the grid) to form a word. When a player makes a word, he/she scores points equivalent to the sum of the timers on the tiles that are used. The timers on all tiles that aren’t used then count down by 1; tiles that reach zero become unusable stone blocks.

Usually, this means that you want to form long words that incorporate tiles that have shorter timers; for example, there is a ten letter word on the above board (PEDESTALED), but it doesn’t use the F. After playing this word, the timers on the remaining letters will count down by 1; the P will now have a timer of 2, the first E 10, and so on.

However, the game does have various mechanics that mean this isn’t always the best strategy. For example, one needs to ensure that there are actually enough letter tiles to work with, and that the distribution of letter tiles allows reasonable words to be formed. The game also has players choose bears for each mission which have special abilities such as awarding additional points for certain words. Actually, in hindsight FEASTED might have been a better choice than DEADLIFTS, as I was using a bear that gives a point bonus for words ending in ED.

In addition to scoring points by making words, players also receive bonus points for the largest fully cleared rectangular area. The game identifies this for you. I’m not sure how this is actually implemented, though there are large enough boards (15 by 15?) that it’s probably not the naive algorithm of generating and testing all rectangles; where N is the width of the board that would be O(N^6). There is a DP-based approach where one can keep track of the number of stone blocks that slashes this to O(N^4) which would probably be enough. With more clever approaches it’s possible to knock the time complexity down to O(N^2) – and that’s probably optimal as any less would involve not actually reading some parts of the grid.

Most of the time, this isn’t something to worry about too much. I’m able to clear most levels with no stones at all. However, if we need to take a stone, ideally it should be as far away from the center of the board as possible (note that only the largest rectangular area counts).

Defining optimal play in general is difficult, owing to the spatial reasoning required by the largest clear area bonus, uncertainty of tiles that are yet to be revealed and sheer breadth of options when faced with many open tiles.

Before starting a round, the player has to select bears to use. Optimal decisions here largely depend on the specific challenge being played. A bear that scores bonus points for six-letter words isn’t particularly useful on a tight board where one has only a few open tiles at a time; some bears add time to timed boards, which isn’t useful if the board has no time limit. Furthermore, players select up to three bears each round; these bears can have useful synergies. In particular, I like a combination of “summon three Zs” and “score triple points for words including JQXZ (not stacking, unfortunately)”; “summon ING” or “summon ED” goes quite well with these two as well. I’ve used these to complete a mission deemed “impossible” given the strength of my bears at the time.

We can perhaps begin by focusing more narrowly on optimal play of an Alphabear endgame (where all tiles have been revealed), ignoring special abilities bears may have for now and assuming that the board has a solution. Here is an example:

From this point on, there is no more uncertainty as to what letters we have to use. Interestingly, this board shows that a greedy algorithm where one tries to play the longest possible words doesn’t work. One could begin WORKSHOPPED, leaving RAII, but this is actually an unwinnable state. There are several two-turn solutions, like ROADWORKS + HIPPIE, or WORSHIPPED + KORAI. (Obviously, this is very high-level play; I think I played SKIPPER with this board. That leaves RODIAHWO, for which I’d probably have played something like RADIO + HOW, though HAIRDO + WO is better.)

Given a multiset of pairs of letters and timers L = (l,t)^+, we want to return a suitably ordered list of words S = \left[ l^+ \right] , that satisfies the following properties:

  • legitimacy: every word s \in S is included in the game’s dictionary.
  • resourcing: words do not use tiles that aren’t in L.
  • timing: for each pair (l, t) \in L, l is used in a word before or on position t.

The above set of properties is not correctly specified without a refinement to deal with duplicate letters. There are several ways to deal with this; assuming low cardinality, one approach could be to recognise all duplicate instances of letters as separate letters, and suitably expand the dictionary to allow this. Our list is then L = \left[ (A,3), (B_1,1), (B_2,1), (E, 1) \right], and the dictionary recognises both B_1 E and B_2 E.

I don’t have a good way of finding an optimal solution; I suspect the problem is actually NP-complete; it feels like SAT or 3SAT may be reducible to this problem, though I haven’t worked through the details. Nonetheless, there are several search heuristics we can try. Clearly, on a given turn we must use everything which now has a timer of 1. Also, although this board is an example of why you don’t want to be greedy, in general it probably makes sense to do some kind of ordered backtracking, where one expands the most promising nodes first. I can imagine an A* search could actually work well, though finding an admissible heuristic could be hard.

There’s a fourth property we want as well:

  • optimality: the score produced by the words is maximal.

Going back to our sample board, WORSHIPPED and KORAI will score one more point than ROADWORKS and HIPPIE, because there is one fewer timer tick. In general, we want to minimise the number of timer ticks by playing longer words upfront where possible, as this reduces the number of timers that are still ticking. Once one has devised a feasible solution, it may make sense to perform some iterative optimisation by seeing if it’s possible to bring letters forward. Of course, that approach may get us stuck in a local optimum. I could see some kind of iterative hill-climbing algorithm from the top-n admissible solutions (as found by our ordered backtracking) yielding decent results.

Unfortunately my vocabulary and anagram skills are far from good enough to handle the required complexity, especially under time pressure. Most of the time, successfully playing every letter on the board is enough to complete the mission’s objectives. Interestingly, it looks like the game wasn’t really programmed to deal properly with successfully using every letter but failing the mission. The game claims that there’ll be an additional reward on the reward screen when identifying the board as clear; yet, there is no reward screen as the mission was failed.

Rebooting Ourselves

It was a fairly rough week. Thus, over this weekend I thought about re-examining and resetting various procedures and things I do, as opposed to actively filling my time with activities. This reminded me of a Humans of New York post I came across several years ago:

“I’m rebooting my life entirely, again. It’s time for Andrew 5.0.”

In computer science, semantic versioning is a system for identifying different versions of software products in a standardised way. Under this system, a product’s version is an ordered triple of nonnegative integers written major.minor.patch. This is typically used for software, though the definition does not seem to require it. The system discusses changes in terms of a public application programming interface (API), which specifies what functionality the product offers.

In terms of software, a SQL database’s API could include the types of queries that may be processed. For MCMAS-Dynamic, the public API would include the details of the modelling language it is able to verify properties for. A non-software example could include a simple kettle; the public API could include how one adds or removes liquid, how one turns it on or off, and possibly alarms or other feedback mechanisms for when the liquid has boiled.

When a new version of a product is released, the version number is increased (in lexicographic terms). How this increase is done depends on the types of changes since the previous version:

  • If the public API is ‘broken’, meaning that previously valid ways of using the API are no longer valid or accomplish different things, then the change requires a major version bump. To do this, the major version is incremented, and the minor and patch versions are reset to 0 (e.g. 7.5.1 \leadsto 8.0.0). For example, if the kettle used to play an alarm when it the liquid was boiled and this was a part of the public API, then the major version should be bumped if this functionality is removed. (In particular, if the API did not specify that the kettle had to play an alarm, this change might not warrant a major version bump.)
  • If new features are added without ‘breaking’ the API or there are non-trivial internal improvements, the change leads to a minor version bump. The minor version is incremented and the patch version is reset to 0 (e.g. 7.5.1 \leadsto 7.6.0). For example, if the new version of the kettle is substantially more energy-efficient, then that could be a minor version bump.
  • If something was broken and has been fixed (without changing the public API), then the patch version should be incremented (e.g. 7.5.1 \leadsto 7.5.2). For example, if the kettle previously rang an alarm twice when the liquid was boiled even though the kettle’s API specifies it should only ring once, then a change that makes the alarm only ring once could be part of a patch version bump.
  • Multiple changes in aggregate should be evaluated in aggregate. In most cases, the largest magnitude of all constituent changes applies, though generally speaking this is not true (consider one bugfix plus two changes, one which breaks the API and another that reverts that change – that is a patch bump, not a major bump).

Generally, making a more aggressive version bump than would be required for one’s change is acceptable, though it can confuse users. In particular, I tend to expect backward-incompatible changes when facing a major version bump; not finding any can be surprising and confusing.

The sentiment of the quote sounded like it was a major version bump. Defining an API for one’s life is obviously very difficult; even if one tries to use a lot of abstraction, I find that there are just too many facets. Rather loosely, our API might be split into a bunch of micro-services. We can treat physical needs and bodily functions like breathing and digestion as infrastructural. These services might then focus on the range of activities we involve ourselves in, or range of activities we could involve ourselves in. For me personally, this could include software engineering, getting along with other people, finance and budgeting, computer science, writing, puzzle solving and so on.

Hopefully, I would imagine that we chew through a lot of patch versions as we continue to improve skills. Today’s release notes could include “Jeremy knows a little bit more about thread pools” (I read a chapter of Java Performance Tuning today over a post-lunch coffee). Minor versions would also be relatively common; this wouldn’t be from today specifically, but “Jeremy can vaguely attempt a Balance Loop puzzle” is probably pretty recent, extending the sudoku and other puzzle-solving features.

Depending on how we define the API, major version bumps could be very common. It is typically important to be relatively disciplined with breaking changes in an API in a software engineering context, as clients may often depend on one’s product in non-obvious ways. While others’ dependencies on us can indeed be non-obvious, I think one factor that changes things is that our systems seem to be ephemeral whilst program code is not. A codebase left as it is over years or centuries retains its capabilities (admittedly, finding suitable infrastructure to run the product might be an issue).

On the other hand, there is some evidence that we lose skills that are underutilised with time. I used to play Dance Dance Revolution quite a lot and could probably pass an arbitrary level 15 song as well as some 17s; I doubt I can do that today as I haven’t played in a few years. The ways we interact with others or manage our finances may change as our personal environments change as well; for example, if I moved away from the UK, I would not be able to allocate my investments the way I do now, because I would lose the ability to use ISAs (and probably most other forms of UK-specific tax-free savings). This may even happen without action (for example, if the UK government changes how ISAs or tax-free savings work) – though you could argue that declaring the use of specific vehicles in one’s API might be too specific and implementation-dependent (“I will use tax-advantaged accounts that are valid in my location appropriately” is maybe better).

In light of the above, I would be a bit laxer with what constituted a ‘breaking change’, which pulls things back toward subjectivity which I think semantic versioning was trying to avoid. I might regard myself as having major version 2 right now; I could consider everything up to and including my second year at Imperial as version 0, which is typically used in development to refer to a pre-release period of rapid iteration. Although National Service and/or moving to the UK for studies did bring about nontrivial changes, I really didn’t know what I wanted to do at that time (not that I know now, but there is at least a vague direction).

The Google internship was probably the turning point for version 1; that also coincided with several major changes with regard to finance, investment, philosophy and priorities. I’d call the second major change to be when graduating from Imperial and starting at Palantir; even then, I’d regard the first set of changes to be more fundamental. The re-examination I did over the weekend is actually probably a patch release (or maybe a minor that improves several non-functional characteristics); it certainly doesn’t warrant a major version bump.

Algorithmic Modelling – Codenames

Codenames is a board game designed by Vlaada Chvatil which tests communication. The game is played with two teams; each team splits itself into clue-givers and guessers.

There is a board of 25 cards, each with one (or more) words on them. Of these cards, 9 are owned by the first team, 8 owned by the second team, 7 are neutral and one is the ‘assassin’.

The clue-givers attempt to communicate to the guessers which cards are owned by their team. This is done by giving a one-word clue followed by the number of cards N corresponding to the clue. The rules establish some bounds on allowed associations (for example, ‘sounds like’ clues are not allowed).

I don’t know how much time went into selecting the words to appear on individual cards, but there are certainly many words in the deck that can be interpreted in many ways, which makes the game fun. For example, I can think of quite a number of ways to clue BOND (COVALENT for unambiguity; DURATION, ASSET or DEBT for the debt; JAMES, AGENT or SEVEN or even at a stretch something like CASINO for the character; FRIEND or FRIENDSHIP; STREET or TUBE; CONTRACT, PROMISE or WORD for the idea of a promise; ADHERE or CONNECT, among others). Which one I might pick would depend on the other words on the board.

Guessers then identify up to N+1 cards they think their team owns. This can be based on the entire history of clues given, not just the previous clue. These cards are guessed one at a time; a team is only allowed to make further guesses if they guess ‘correctly’ one of their team’s cards. Teams may refrain from making all guesses.

The game ends either when all cards owned by a team are revealed (that team wins), or if a team reveals the assassin (that team loses).

In practice, clue-givers will need to consider all cards, not just the ones their team owns. The penalty for ‘failure’ varies; the assassin is an instant loss, and revealing a card owned by the other team is also bad. For example, with the cards APPLE, BANANA and GRAPE it would be very tempting to declare (FRUIT, 3) as a clue; yet, if KIWI is owned by the other team (or worse, is the assassin) it might be dangerous.

However, if the other team has already revealed that KIWI is theirs (e.g. with another clue, maybe SOUTH or BIRD) then the FRUIT clue becomes safe. Thus, pre-computing a strategy at the beginning of the game (e.g. by partitioning the eight or nine clues owned into several logical groups while avoiding other words) may not be optimal.

I tend to consider making an effort to win to be a key part of games, and thus what moves may be considered good also often depends on the state of the game. For example, if the opposing team has just one card left, I will give a broader clue that may have more tenuous links in a ‘do or die’ effort to finish on this turn. A less extreme example would be attempting weaker links if behind, or perhaps playing slightly more conservatively if ahead.

Mathematically modelling Codenames can be tough. We can try modelling the game state as a tuple (O, E, N, x, h, f_O, f_E) where O, E, N are sets of the remaining clue words our own team has, words the enemy team has, and the neutral words, x is the assassin, h = (t,w,n)^+ is the history of the game, and f_O and f_E are preference functions of the form h, w, n, O, E, N, x \rightarrow P, returning an ordered list O \cup E \cup N \cup \lbrace x \rbrace. (t, w, n) is a clue meaning that a given team gave a clue word and hinted that some number of cards was relevant. This already abstracts two difficult parts away – ordering the clues, and determining how many of the top preferences to pick.

I think the preference functions need to take into account previous clues from both teams; if a previous clue could clearly have corresponded to two words and I picked the wrong one, it might be fairly high on the list even if unrelated. Similarly if this scenario happens to the opposing team, I would probably avoid the word that would blatantly be theirs.

The notion of degree of confidence also isn’t captured as well in our ordering; going back to the fruit example, having a clear ordering would imply that clues for less than 4 items could reliably result in correct guesses (if we knew that APPLE was by a small margin the best pick, we could safely and confidently give (FRUIT, 1) to clue it, which seems wrong). One could imagine modelling these with selection probabilities, though things would be even more complex.

The above representation still seems computationally difficult to work with. The evolution of the preference function as one moves from one state to another is unclear (so in that representation it is fully general), making lookahead difficult. It doesn’t seem like a greedy choice is always best; for example, given eight clues that are reasonably divisible into four pairs, a clue for 4 words taking one element from each pair might be a bad idea if the other words can’t easily be linked.

We can proceed by simplifying the preference functions; a simple approach could be that for each w, n each team has a persistent preference function that returns an ordered list of O_0 \cup E_0 \cup N_0 \cup \lbrace x \rbrace. The preference functions during the game then return the subsequence of that list that contains strictly the words still left in the game. This of course doesn’t take into account past information or clues from the other team.

With this, we can attempt to solve the game using standard techniques; assuming that the vocabulary of clues is bounded (let’s say it must be from the Linux dictionary), a game state is winning for the current team if there exists some word for which the preference function returns everything in O as a prefix. A state is losing if all moves in that state produce a state which is winning for the opposing team.

We can generalise this to a simple probabilistic model as well; the preference ‘functions’ instead return a discrete random variable that indicates either guessing some word or passing. A simplified model could then, at the start of the game, assign weights to each candidate indicating the relative probability that candidate would be selected. These can be normalized to account for a pass probability. As words are removed from the board, the probability of the remaining words being selected scales (we can imagine a simple rejection-sampling where we discard words that aren’t actually on the board any more).

The algorithm for the probability that we get a win from a given state is then slightly more complex (though I think still reasonably covered by standard techniques).

Likelihood Estimation

One of my team-mates introduced another interesting question over lunch. Working through it reminded me of some of the statistics problems I struggled with at Imperial, specifically during Intelligent Data and Probabilistic Inference. It reinforced that in spite of scoring 90 for that course I’m still not confident I knew what I was doing then (or now).

Suppose you have a symmetric triangular distribution of unknown width and mean. Given that the distribution has yielded three independent samples of 2, 4 and 5, what is the expectation of the mean?

The triangular distribution can be used as an estimate for something which is known to be bounded both above and below, that also takes into account a value known to be the most probable. Some argue that it is the simplest distribution satisfying these (though one could argue that a cosine or some kind of truncated normal might be more applicable).

The instinctive answer I have is simply the mean of the samples, or \frac{11}{3}, though I was suspicious as probability and statistics often yield non-intuitive results.

The distribution is named as such because it has a triangular probability density function; because of the laws of probability (area under the function must be 1), specifying the minimum, maximum and mean is enough to uniquely identify it. Supposing we have a minimum a, a maximum b and a mean c, this yields a pdf of:

f(x) =\begin{cases} \dfrac{2(x-a)}{(b-a)(c-a)} & a \leq x \leq c \\ \dfrac{2(b-x)}{(b-a)(b-c)} & c \leq x \leq b \\ 0 & \text{otherwise} \end{cases}

We are only dealing with a symmetric case, so we can set c = \frac{a+b}{2} which cleans things up a little:

f(x) =\begin{cases} \dfrac{4(x-a)}{(b-a)^2} & a \leq x \leq c \\ \dfrac{4(b-x)}{(b-a)^2} & c \leq x \leq b \\ 0 & \text{otherwise} \end{cases}

Based on our observations that we have three samples of 2, 4 and 5, we can construct the likelihood that a given triangular distribution gave rise to a certain result. While the probability sampling a given distribution resolves to a precise value is infinitesimally small, we can still compare them in relative terms using the density functions. We can write this as

P(2,4,5;a,b) = f(2)f(4)f(5)

Expanding this term will depend on where exactly 2, 4 and 5 fall in our triangle. Let’s work out the most promising case (where 2 falls on the left of c while 4 and 5 fall on its right); the rest are left as an exercise to the reader. In this case, we have

P(2,4,5;a,b) = \dfrac{4(2-a)}{(b-a)^2} \times \dfrac{4(b-4)}{(b-a)^2} \times \dfrac{4(b-5)}{(b-a)^2} = \dfrac{64(2-a)(b-4)(b-5)}{(b-a)^6}

At this point, we notice the original question needs a bit more specification. We aren’t given what the distribution of possible values of a and b is. One way of getting around this is just to pick a uniform distribution; however, that isn’t quite defined over the real line. We can for now simply find the maximum likelihood estimate for a and b.

Alternatively, if we give prior probability distributions for a and b, we could also use the samples as information to get a posterior distribution. Usually we would pick a conjugate prior distribution that wouldn’t fundamentally change even when accounting for the sample; I didn’t find one for the triangular distribution, though.

If we want to find the most likely distribution, we seek to find an extreme point; this can be done by taking partial derivatives (and this expression actually lines up quite well with the quotient rule). There is a fairly standard ‘trick’ for handling these, though; since the logarithm is a strictly increasing function, we compute the log-likelihood instead. The maximum of the logarithm will also be the maximum of the original function. Using the laws of logarithms, we get something a lot more tractable:

\log P(2,4,5;a,b) = \log 64 - 6 \log(b-a) + \log(2-a) + \log(b-4) + \log(b-5)

Computing the partial derivatives is then straightforward. We then set them to zero, and solve the resulting equations simultaneously; this yields

a, b = \left( \dfrac{1}{3} \left( 4 + \sqrt{\frac{2}{5}} \right), \dfrac{1}{3} \left( 16 - \sqrt{10} \right) \right), \left( \dfrac{1}{3} \left( 4 - \sqrt{\frac{2}{5}} \right), \dfrac{1}{3} \left( 16 + \sqrt{10} \right) \right)

We’ll want the second pair of solutions; the first actually has b \approx 4.279 which is no good (we need b > 5 ). Interestingly, the mean of that triangular distribution is then \dfrac{2}{15} \left(25 + \sqrt{10} \right) \approx 3.75497 which is not quite 11/3.

Indeed, though, the log-likelihood we get with these values of a and b is about -4.74053. Indeed, if we look at the family of distributions with  a = \frac{11}{3} - \alpha and  b = \frac{11}{3} + \alpha , the best we get is about -4.7473.

On Deckbuilding Games and Slay the Spire

Deckbuilding is a genre of games where players start with a (usually) standard deck of cards and then improve it over the course of the game. This can involve acquiring new cards that are usually more powerful than the initial set of cards and/or synergise in better ways, upgrading existing cards, or even removing cards that are weaker or less relevant so that one is more likely to draw the (better) remaining cards.

These games typically offer some replayability because the sequence of decisions one may have on how one can improve one’s deck differs from game to game; what it means for a deck to be successful can also vary depending on what the other players are doing.

Slay the Spire is a player-versus-environment RPG where combat is driven by a deckbuilding mechanic; players start with simple offensive (Strike) and defensive (Defend) moves plus possibly a few class-specific cards. The combat is turn-based. On a player’s turn, the player draws 5 cards. Cards have an energy cost, and players initially have three energy per turn to play as many cards from their hand as they want. Of course, there are cards and other mechanics that can be leveraged to obtain additional card draws or energy. Players are also shown their opponents’ next moves, which allows for some degree of planning; a simple example of this could be deciding to eliminate a specific monster in a group, because it is about to attack.

In terms of gameplay, the start of the game does feel relatively rudimentary as the player always has the same or a similar deck. However, each time a player defeats a group of monsters, he is given the option (but not the obligation) to add one of three cards to his deck.

The options that come up in the first few rounds will typically shape how the deck develops. For example, while most cards can be upgraded once only, there is a specific card called Searing Blow that may be upgraded multiple times; eventually, it can develop into a card that costs 2 energy but yet deals far more damage than Bludgeon, a 3 cost card that deals pure single-target damage. It is a viable strategy to build a deck around this, eliminating other cards from the deck so that one draws it on every turn; there are cards like Double Tap (which, for 1 cost, doubles another attack) that can further enhance this type of deck.

Conversely, I frequently play in a style which relies on combining cards that give a constant strength bonus to all attacks along with cards that attack multiple times for small amounts of energy. That said, some degree of flexibility is important, as the choices given to the player will be different on different plays; being wedded to a single strategy can be dangerous, as specific key cards may not be available.

Players can also acquire relics which further influence game mechanics; these can simply provide boosts (such as “every 10th attack does double damage”), make certain styles of gameplay more attractive (“the third Attack card you play each turn boosts your Dexterity by 1 for the rest of the combat”) or even significantly change game mechanics (“retain your hand rather than discarding it” or “if you empty your hand, draw 1 card”). These are random (and apart from at the shop / rewards from winning boss battles, players aren’t really given options here) so, as before, planning on specific relics becoming available is also typically not a good solution.

Winning the game unlocks a new-game-plus mode called ascension; players start at Ascension 1, which increases the rate at which elite enemies spawn. Winning the game on Ascension 1 unlocks Ascension 2, which applies the restrictions of Ascension 1 and makes normal enemies do more damage. This increases incrementally up to Ascension 20. I’ve got up to 16 on one of the characters. At high levels there’s not very much room for error; I have had a 10 win-streak without Ascensions, but struggle with high Ascensions quite a bit. I think I tend to force my deck to fit a certain mold perhaps too early on; this can get scuppered if I subsequently am unlucky.

Another challenge a player can take on, as with many other games, is speed-running, or attempting to win quickly. Playing fast (as shown in the screenshot above) limits one’s time for decision making. For example, I normally plan my entire route through the map for each level at the beginning of the level, but that of course takes a bit of time. A normal, more relaxed game can easily take 30-40 minutes, especially if I’m playing with a relatively less aggressive build.

I’ve played the game for about thirty hours or so. It has been enjoyable, and the Defect character (an automaton that manipulates a queue of orbs) is particularly fun, if difficult to use. In terms of educational value I think the game demands a decent intuition of probability, as well as understanding how cards may work together. There are some synergies that are explicit or clear, but others are often less obvious and require some thought/experimentation.