Taking Risks on Rotations

I haven’t owned a gaming console since the Playstation 2, mainly because I’ve found PCs to be sufficient for my own purposes and I don’t do very much gaming in the first place. I already have a pretty good PC for some of my own work (such as programming and some image/video editing I’ve done in the past). Nonetheless, I do enjoy watching a few YouTube channels which feature playthroughs of games with commentary, and I’ve certainly seen a lot of news regarding Nintendo’s latest console, the Switch.

One of the launch titles for the console, 1-2-Switch is a minigame collection which largely serves to demonstrate the capabilities of the Switch’s controllers (or “Joy-Cons”). The controllers have buttons, motion sensing capabilities, a camera (“Eating Contest”) and some kind of precise vibration (“Ball Count”). Most of the minigames seem simple and somewhat fun, as well as relatively balanced (in fact most games don’t really feature a way for players to interfere with one another – several others also have players’ roles over the course of the entire minigame being largely symmetric). There are a few that seem unbalanced (“Samurai Training” – assuming equal probability of success the first player has a greater probability of winning), though there was one that was clearly unbalanced in favour of the second player, that being “Joy-Con Rotation”.

The game invites players to rotate their controller about a vertical axis as much as possible, without otherwise disturbing it (e.g. by tilting the controller) too much. Scores are measured in degrees of rotation, with “unsuccessful” turns (e.g. too much alternative movement) scoring 0. Players get three rotations in turn, and then the score from their three turns is added up. In a sense this gives the second player an advantage; on his final turn he needs to know how much to rotate the controller to win, and should aim for precisely that. The problem we’re analysing today is then: how do we make a decision on how much of a rotation to attempt if we’re the first player, and how much of an advantage does the game give the second player (at least as far as the final turn is concerned)?

Let’s make a couple of simplifying assumptions to this:

  • Let X_i be a continuous random variable over [0, \infty) and some is. The probability mass function of the X_i, f_{X_i}(x) are assumed independent, continuous and tending to zero as x tends to infinity. Furthermore, for a given value of x representing the number of degrees player i attempts to rotate the controller, we are successful iff X_i \leq x.
  • We assume scores are continuous (in the actual game, they’re only precise down to the degree).

Now, assuming we aim for a given rotation x, the probability of a win is then approximately f_{X_1}(x) \times (1 - f_{X_2}(x)), which we seek to maximise. Thus, for a simple example, if the X_i are exponentially distributed with parameters \lambda_1 and \lambda_2, we then seek to maximise the function

g(x) = (1 - e^{-\lambda_1x}) \times e^{-\lambda_2x}

This can be minimised, and yields a solution of x^* = \frac{1}{\lambda_1} \log \frac{\lambda_1 + \lambda_2}{\lambda_2}; this gives the first player a winning probability of

\left( \dfrac{\lambda_1 + \lambda_2}{\lambda_2} \right)^{-\frac{\lambda_2}{\lambda_1}} \left( \dfrac{\lambda_1}{\lambda_1 + \lambda_2} \right)

So if both players are equally skilled in terms of rotation ability (that is, \lambda_1 = \lambda_2) then the first player wins with probability of just 0.25. Furthermore, we can find out how much better player 1 needs to be for an even game (that is, the value of k for which \lambda_1 = k\lambda_2); we formulate

(k+1)^{-\frac{1}{k}} \left(\dfrac{k}{k+1}\right) = 0.5

which works out to about k = 3.4035. If this player went second, however, he would have an over 0.905 probability of winning!

The multiple trial case is interesting, especially because it seems like the problem does not demonstrate optimal substructure; that is, maximising the probability you win for a given round doesn’t necessarily mean that that’s the correct way to play if going for more rounds. For a simple (if highly contrived) example; let’s suppose we play two rounds. Player one can (roughly) make rotations up to 50 degrees with probability 1, 200 degrees with probability 0.3, and more than that with probability 0. Player 2 can make rotations up to 49 degrees with probability 1, 52 degrees with probability 0.6, and more than that with probability zero. To win one round, you should spin 50 degrees (win probability: 0.4); but to win two, if you spin 50 degrees twice, your opponent could spin 52 once and 49 once (leaving a win probability of 0.4); going for 200s and hoping for at least one success gives you a win probability of 0.51.

Iterative Dichotomiser (January Haskell Tests 2017)

I probably can’t really explain this much better than the source:

The January tests are on-line Haskell programming tests sat under examination conditions by first-year undergraduate students at  Imperial College at the end of a six-week introductory programming course.

Dr. Tony Field has been publishing them for quite a few years now (the linked page goes back to 2009). I still to some extent remember my first year Haskell courses, somewhat impressed by the rationale for choosing Haskell even though my understanding at the time was rather clouded. I do remember a specific instance where Tony polled the class on past programming experience, noting hands for C++ and Java (I raised my hand for both), and then tossing in Haskell (a few people; probably because Imperial did say to prepare to study Haskell beforehand!). Besides this, I think  having to worry less about persistent state (or race conditions, though I don’t think we covered concurrency at Imperial until second year) and being closer to the notion of mathematical functions, which students should already have been exposed to, would also have helped.

This year’s test (2017) covered decision trees, culminating in a question inviting candidates to implement the information gain version of ID3 when building a decision tree from a dataset. It wasn’t too difficult as a whole, as Tony acknowledged on his page; despite probably last having written Haskell about a year ago when I attempted the 2016 test, I finished comfortably in 2 hours and 12 minutes (the time limit is 3 hours). I think this test as a whole required some fairly careful work, but didn’t demand anything in terms of special insight even at the very end (as some past tests have done). The elegance of my answers would probably leave a fair bit to be desired, though; I found I was building recursive traversals of the given datatypes very frequently.

That said, I found the first part somewhat more difficult than in the past. Typically Part I was extremely straightforward (rather awkwardly, there used to be a question asking students to implement lookUp almost every year); not so much this time. In particular, there was a rather fiddly function to write that involved navigating some data structures and building a frequency table; the spec featured a lot of type definitions that reminded me a bit of some experiments with Object Calisthenics (in particular, the “wrap primitives and strings in classes” rule). That said, completing Part I alone would already have let you pass (with a 47; the pass mark is 40). I think the frequency table was harder than anything in Part II, actually, which had a few, fairly straightforward tree functions to write.

Moving on, part III effectively introduced students to the Strategy pattern (in terms of an attribute selector for deciding which attribute to split the dataset on). Apparently, it was possible to solve partitionData with a suitable one-liner, though I unfortunately didn’t come up with anything along those lines, and went with a rather “direct” approach of dividing the rows by the element and removing the relevant attributes. Part IV introduced the concepts of entropy and information gain, and thus invited students to implement ID3; given the title of the paper I wasn’t too surprised to see this at the end.

I found it fairly interesting to revisit Haskell, and it was nice to see that I was still able to work with the language after not touching it for a year or so. While it might be fairly unlikely that I would work with functional languages in professional terms, concepts and reasoning that are more apparent in functional languages do certainly apply even when I work in Java or C++, whether in the obvious sense (streams/map/filter etc.) or less so (inductive/algebraic datatypes).

Diagnosing a Faulty Disk

Recently, I had a lot of difficulty booting into Windows on my home laptop. More specifically, I could boot, but the OS would very quickly become unresponsive. However, I had no such issues booting into Ubuntu (and I’m typing this from that). I tried booting into Safe Mode and disabling auxiliary services and devices, but that was to no avail.

Nonetheless, I could use Ubuntu to perform some analysis. I noticed rather unpleasant-looking logs in the Linux message buffers (a.k.a. dmesg).

[   20.588310] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   20.588311] ata6.00: irq_stat 0x40000001
[   20.588312] ata6.00: failed command: READ DMA
[   20.588315] ata6.00: cmd c8/00:08:40:28:1c/00:00:00:00:00/e0 tag 22 dma 4096 in
[   20.588315]          res 51/40:08:40:28:1c/00:00:00:00:00/e0 Emask 0x9 (media error)
[   20.588316] ata6.00: status: { DRDY ERR }
[   20.588316] ata6.00: error: { UNC }
[   20.619619] ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs'
[   20.647929] ata6.00: configured for UDMA/133
[   20.647935] sd 5:0:0:0: [sdb]  
[   20.647936] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   20.647937] sd 5:0:0:0: [sdb]  
[   20.647938] Sense Key : Medium Error [current] [descriptor]
[   20.647939] Descriptor sense data with sense descriptors (in hex):
[   20.647942]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[   20.647943]         00 1c 28 40 
[   20.647944] sd 5:0:0:0: [sdb]  
[   20.647945] Add. Sense: Unrecovered read error - auto reallocate failed
[   20.647946] sd 5:0:0:0: [sdb] CDB: 
[   20.647948] Read(10): 28 00 00 1c 28 40 00 00 08 00
[   20.647949] end_request: I/O error, dev sdb, sector 1845312
[   20.647955] ata6: EH complete

Hmm, UNC stands for uncorrectable; not great. These errors were easily reproduced. Upon finding that I could nonetheless seem to view most of the contents of the file-system when mounting the SSD in Linux, I immediately took a backup of the important data to an external hard drive, though this was probably objectively unnecessary (really important data is backed up in the cloud).

My first instinct was then to carry out a self-test of the drive, using its SMART (self-monitoring, analysis and reporting technology) tools. I first performed an extended “self-test” of the drive, which seemed to yield suspiciously positive results.

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4874         -

I decided to probe further. The tool’s output includes a table of various attributes which can be metrics of a drive’s health:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0003   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       2664
 12 Power_Cycle_Count       0x0003   100   100   000    Pre-fail  Always       -       2577
170 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       2
171 Unknown_Attribute       0x0003   100   100   000    Pre-fail  Always       -       0
172 Unknown_Attribute       0x0003   100   100   000    Pre-fail  Always       -       0
173 Unknown_Attribute       0x0003   100   100   000    Pre-fail  Always       -       102815
174 Unknown_Attribute       0x0003   100   100   000    Pre-fail  Always       -       59
175 Program_Fail_Count_Chip 0x0003   000   000   000    Pre-fail  Always       -       0
176 Erase_Fail_Count_Chip   0x0003   100   100   000    Pre-fail  Always       -       0
177 Wear_Leveling_Count     0x0003   100   100   000    Pre-fail  Always       -       102815
178 Used_Rsvd_Blk_Cnt_Chip  0x0003   100   100   000    Pre-fail  Always       -       1
179 Used_Rsvd_Blk_Cnt_Tot   0x0003   000   000   000    Pre-fail  Always       -       2
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   100   100   000    Pre-fail  Always       -       784
181 Program_Fail_Cnt_Total  0x0003   100   100   000    Pre-fail  Always       -       0
182 Erase_Fail_Count_Total  0x0003   100   100   000    Pre-fail  Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   001   001   000    Pre-fail  Always       -       26018192
187 Reported_Uncorrect      0x0003   100   100   000    Pre-fail  Always       -       2
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       19
192 Power-Off_Retract_Count 0x0003   100   100   000    Pre-fail  Always       -       59
195 Hardware_ECC_Recovered  0x0003   100   100   000    Pre-fail  Always       -       0
196 Reallocated_Event_Count 0x0003   100   100   000    Pre-fail  Always       -       2
198 Offline_Uncorrectable   0x0003   100   100   000    Pre-fail  Always       -       1
199 UDMA_CRC_Error_Count    0x0003   100   100   000    Pre-fail  Always       -       0
232 Available_Reservd_Space 0x0003   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0003   100   100   000    Pre-fail  Always       -       6425
241 Total_LBAs_Written      0x0003   100   100   000    Pre-fail  Always       -       214612
242 Total_LBAs_Read         0x0003   100   100   000    Pre-fail  Always       -       214201

This is a rather large and nasty table, and it didn’t seem that Plextor implemented any meaningful values for the thresholds. Nonetheless, raw data counts were available. The first alarm bell was metric number 184, End-to-End Error (which sounds terrible); that had apparently happened over 26 million times. Some sources suggest this is critical while others do not; I don’t have historical data regarding the progression of this figure, so it would be difficult to draw conclusions as to how this happened – or, if the sources saying this is a critical metric are correct, how the disk limped to this point in the first place.

Nonetheless, there were other negative indicators as well; 187, 188 and 198 which have been associated with disk failures were all notably more than zero. There were several other ATA errors appearing in the smartctl output as well.

For comparison, I ran the diagnostic tools on my HDD:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   118   118   033    Pre-fail  Always       -       2
  4 Start_Stop_Count        0x0012   098   098   000    Old_age   Always       -       4642
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   084   084   000    Old_age   Always       -       7425
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       2574
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       32
193 Load_Cycle_Count        0x0012   072   072   000    Old_age   Always       -       283355
194 Temperature_Celsius     0x0002   146   146   000    Old_age   Always       -       41 (Min/Max 10/58)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1
223 Load_Retry_Count        0x000a   100   100   000    Old_age   Always       -       0

Much better. I looked up 199 just in case, but it seemed fine (just one occurrence; and in any case the super important data has been backed up). Notice 197, 198 and 5 (bad sector reallocations) are all zero.

The current situation is fine (the slower startup times are a little unpleasant, though would probably have been worse with Windows). I might investigate replacing the disk and/or getting a new machine (this laptop is reaching 3 years, so an upgrade would be nice) when my budget allows for it. That said, my usage patterns don’t seem to suggest a higher-end machine is necessary at all, so I might stick with it (Sims 4 isn’t that demanding; the most demanding thing I played would probably be Fallout 4 and while I could use a GPU upgrade, that’s clearly a want, not a need). I haven’t quite decided yet.

On the Practical Complexity of Efficient Testing

This is a part 2 to the previous post on the theoretical complexity of efficient testing. Recall that we modelled tests as being used to verify that code satisfied some requirements, and then modelled the problem of efficient verification as finding the smallest set of tests that covered all of the requirements.

Although the decision problem is NP-complete, we can still put forth a decent attempt at solving it. We can rewrite the set-covering problem as an integer linear programming problem (define an indicator variable indicating whether each test was included in the test set or not, and define a constraint for each requirement, indicating that at least one of the tests that satisfies it is true; we then need to minimise the sum of all of the indicator variables). There are many solvers such as GLPK or CBC that can solve even fairly large instances of these problems. Similarly, we can also reformula set cover as boolean satisfiability; there are many solvers that can handle large formulae with many variables as well.

That said, although we can minimise the number of tests being used, it’s not entirely certain that we should, for several reasons. For example, suppose we wanted to test a function that returns all instances of characters occurring exactly two times in a string. Well, this is one possible implementation – and I’d be fairly confident in saying that you can’t really do better than linear time (you can’t avoid parts of the string in general, though there are some cases where you can shortcircuit e.g. if you have examined a portion of the string where every allowable character has appeared at least three times).

public static Set<Character> findPairs(String input) {
  Preconditions.checkNotNull(input, "findPairs called on null input");
  Map<Character, Integer> charCount = Maps.newHashMap();
  for (char ch : input.toCharArray()) {
    charCount.putIfAbsent(ch, 0);
    charCount.put(ch, charCount.get(ch) + 1);
  }
  return charCount.entrySet()
                  .stream()
                  .filter(entry -> entry.getValue() == 2)
                  .map(Map.Entry::getKey)
                  .collect(Collectors.toSet());
}

The first problem would obviously be whether the number of tests is even a good metric. I’ve written a few tests for the method above:

@Test
public void returnsCharacterAppearingTwice() {
  assertThat(findPairs("aa")).containsExactly("a");
}

@Test
public void doesNotReturnCharacterAppearingThrice() {
  assertThat(findPairs("aaa").isEmpty()).isTrue();
}

@Test
public void throwsOnNullString() {
  assertThatThrownBy(() -> findPairs(null))
          .isInstanceOf(NullPointerException.class);
}

@Test
public void canFindPairs() {
  assertThat(findPairs("aa")).containsExactly("a");
  assertThat(findPairs("aaa").isEmpty()).isTrue();
  assertThatThrownBy(() -> findPairs(null))
          .isInstanceOf(NullPointerException.class);
}

I’d certainly prefer having the three tests which each test something specific, as opposed to the single canFindPairs() test (in fact, if I came across the latter in a code review I would push back on it). The main problem here is that one way of reducing the number of tests is simply to merge existing tests or run large integration tests only, which is generally not great. Incidentally, this could lead to an extended heuristic, where we weight test methods by number of assertions.

But let’s suppose tests have disjoint assertions, and we don’t attempt to game the system in the way described above. The next issue is then how we define requirements. One possibility is to give methods well-defined postconditions and check that tests verify these, but this is unlikely to scale to large systems.

A common method, then, is to use code coverage as a proxy (this can be measured automatically via tracing of test calls). Line coverage, including adjusting for conditionals could be a good starting point. However, this isn’t really a good metric either – the three tests introduced above or the single canFindPairs() test actually achieve 100 percent coverage, by most definitions:

  • We have an input that violates the precondition, and two that pass it (line 2).
  • We do exercise the body of the for loop with the “aa” and “aaa” tests (lines 5-6).
  • We have both true and false outputs in the filter construction (line 10). This might not even be considered to be a requirement for the line to be covered.

Yet, if someone submitted the above tests only for findPairs() and I did a code review, I would ask them to add more testing. I’d probably expect at least the following:

- empty string ("")
- one char ("a")
- inputs returning multiple chars ("aabbbccdde")
- nonconsecutive pairs ("abab")

Furthermore, the above method is not actually correct if going beyond UTF-16, so if (but only if) that would be likely given the context of the application involved I would ask for a test featuring that as well.

Thus, by optimising for code coverage and eliminating tests based on that, we run the risk of weakening our tests to the point where they couldn’t catch legitimate faults. For example, a test using characters outside of UTF-16 as described above would be unlikely to improve coverage at all, and thus might be pruned (thus allowing our implementation to pass even though it wouldn’t work). Approaches for evaluating whether this is worthwhile can include having developers plant faults in code, seeing if test suites after pruning can still catch them, or automatically mutating implementations (e.g. interchanging operations, changing the order of lines of code etc.) and seeing if test suites behave differently before and after pruning.

Coverage is still probably one of the least worst metrics in my opinion – I can’t really think of a good way of improving on it cheaply and scalably. Furthermore, studies have shown that in spite of line coverage being a kind of blunt instrument, it is nonetheless able to in several practical cases achieve decent reductions in test suite sizes without harming fault detection too much; nonetheless, the most aggressive solutions (such as using integer linear programming) seem to overfit to some extent, performing more than commensurately less well at detecting faults that were introduced.

On the Theoretical Complexity of Efficient Testing

Complexity was one of the courses I took at Imperial that I found to be especially intellectually stimulating. I would say it was probably the most challenging course of the degree for me at least (excluding the individual project; the second half of Intelligent Data and Probabilistic Inference probably comes close, also bearing in mind the bar being set at 90). I would say the course has certainly helped me reason about algorithms and problems in more interesting and divergent ways, and has also been useful in figuring out when to stop trying to devise excessively clever algorithms (by understanding when that is impossible).

The course starts by looking at decision problems (that is, problems with a yes/no) answer, and whether they fit into the class called P (decidable in polynomial time). Examples of problems in P include finding the maximum element in an array or list, and finding the shortest path between a pair of nodes in a graph (this is breadth-first search for unweighted, and Dijkstra’s algorithm for weighted, both of which run in polynomial time).

We then move on to NP (nondeterministically decidable in polynomial time) – effectively, this means that it is possible to verify a “proof” of the problem being decided as a yes in polynomial time. Well-known problems in NP include the Hamiltonian path problem (given a graph, is there a path passing through each node once?) – this one is in NP because a “proof” that the answer is yes can be given in the form of a path. We can check that the edges match simply by walking down that path, and the path is a permutation of the nodes. Note that we don’t need to be able to prove negative answers in polynomial time – there is a separate class called co-NP for cases where we can prove these. (NP \cap co-NP is a strict subset of both NP and co-NP, and includes P as well as a couple of other problems like integer factorisation.)

(The course also covered the underlying theory behind P and NP, concerning Turing machines – I won’t go into detail here.)

One of the biggest parts of the course involved the concept of many-one reduction; a problem V reduces to a problem W if we can translate an instance of V to an instance of W, within certain constraints. For example, the problem of determining whether all integers in a set are positive reduces to that of determining whether all integers are negative under a polynomial time constraint; the first problem (positive) holds for some set S if and only if the second holds for T, which is S with all elements multiplied by -1. In a sense, we’re showing that if we can solve W, then we can solve V as well.

We then defined the hardest problems in a given class to be the problems that are complete for that class; that is, every problem in that class reduces to a complete problem (and yes, the complete problems reduce to themselves). For NP, the constraint on the reduction is that it can be carried out in polynomial time. This makes sense as NP would not be closed under reductions allowing more generous bounds (for example, if V reduced to W under some exponential time procedure and W was in NP, that doesn’t mean V is in NP; however, if V reduces to W under a polynomial time procedure and W is in NP, then so is V). This holds because if we had a guess for V, we could “translate” it using our reduction to a guess for W which can be verified in polynomial time.

The Hamiltonian path problem (hereafter HP) I introduced earlier is NP-complete, and can be used to show that other problems are NP-complete (by reducing HP to the other problem). This works because from our definition of NP-completeness, any problem in NP can be reduced to HP, and then reduced to our problem. This involves potentially composing two polynomial time mappings – but, the composition of polynomials is polynomial too. Note that it’s not just a sum, because the “output” of the first mapping, which is the input to the second mapping, could be bigger than the initial input (though only polynomially so!).

(For showing HP itself is NP-complete, there is a reduction of SAT – the problem of determining whether a logical formula in conjunctive normal form – to HP, and SAT itself can be proven NP-complete from the definition of NP over Turing machines.)

The inspiration for this post came from some preparatory work I was doing for a reading group with James, on algorithms for test-suite minimisation. The authors comment that the problem of finding a minimal representative subset for a test-suite is NP-complete. As a starting point, removing some of the details, their specific problem (involving mapping of tests to requirements shown by said tests) given is easily reformulated as the Minimum Set Cover problem; that is, given some possible subsets C of a bigger set S, find the smallest subset of C, such that the union of the sets we put in our subset itself is S.

Strictly speaking, this isn’t a decision problem, but we can put it into a decision form: given some possible subsets C of a bigger set S, is there a subset of C, such that the union of the sets we put in our subset itself is S with size not greater than k? Solving the original problem can then be performed by a binary search on k, at each step testing this decision problem for a different value of k. Let’s call this problem Set Cover (Decision) or SC(D).

We could proceed with trying to show HP reduces to SC(D), though I find it easier to do things a bit differently – we can pick any problem that’s NP-complete (so I’ll pick SAT), and we can also exploit transitivity of reductions as mentioned above. I’ll thus use two intermediate steps:

  1. Reduce SAT to 3SAT (the problem of determining whether a formula in conjunctive normal form where each clause has three literals is satisfiable)
  2. Reduce 3SAT to Vertex Cover (VC – given a graph and some positive integer k, is there a set of nodes, size up to k, such that every edge has at least one endpoint in the set?)
  3. Reduce VC to SC(D).

Step 3 is probably the easiest now – given an instance of VC, we map each edge to be an element in S, and each node to be a candidate subset containing all of the edges that are adjacent to it. That mapping requires us to iterate through the graph perhaps a few times, but that’s definitely polynomial in the input. Comparing the definitions it’s pretty clear that this should work (that said, if you were actually doing a complexity course you might need a more rigorous proof).

Step 1 relies on the use of logical equivalences – the idea being that we can always rewrite a clause that doesn’t use three literals into one or more clauses that do. The one and two cases simply involve repeating one of the literals that’s already there; if we have more than three, we can introduce auxiliary variables to chain them together:

// SAT
A or B or C or D or E

// 3-SAT equivalent version
(A or B or X) and
(!X or C or Y) and
(!Y or D or E)

Consider that the 3-SAT version is satisfiable if and only if the SAT version is satisfiable. If any one of the original variables can be set to true, then we can choose suitable values of X and Y (plus other auxiliary variables in larger cases) to satisfy all of the other clauses, “rippling out” from some variable that was set to true. However, if none of them are, we’ll run into a conflict somewhere.

Finally, step 2 is somewhat trickier. At a high level, the idea is that we want to enforce some way of selecting a truth value for each variable, that would satisfy at least one of the literals in a clause. We can then construct triangles for each clause, somehow wanting to have our cover only include two of the nodes in a triangle (corresponding to ‘false’ literals). The unselected literal must then be “true” in some sense, but we cannot have multiple conflicting literals in different clauses being true. To handle this, we introduce an additional constraint linking the literals we pick as true across clauses, yielding a construction like this:

The formula is satisfiable if and only if we can build a vertex cover picking two of the nodes in each clause, and one truth value for each variable. So k = 2C plus V, where C is the number of clauses and V the number of variables.

I think the course and especially exams on it were difficult because these constructions require nontrivial insight, which can be really difficult under pressure. Having a large set of known starting points would certainly have helped; this kind of three-stage derivation would probably not be needed in an exam, though constructions of a similar level of difficulty as step 2 were certainly present.

Another Look at Dynamic Programming

Whilst on the tube today, I overheard a mother teaching her child how to count, using a method likely to be extremely familiar to many – fingers. The child counted correctly from one to ten, and then the mother added her hands too and asked the child to count how many fingers there were now.

“One, two, three -“

And so on, till twenty. The mother then attempted to explain that it would have been faster if the child continued from ten, rather than starting again. Although it wasn’t really an example of the concept, the words dynamic programming immediately shot to the front of my mind. I initially found this to be a rather confusing concept to grasp (let’s say that up to high school programming contests, if a problem wasn’t solvable by exhaustive search or greedy algorithms I’d likely have struggled), so I figured a post on it might be worthwhile.

(This isn’t really an example of DP; I’d say it’s closer to divide and conquer plus the use of a cache. We’ve cached the answer that the child has ten fingers, and identified the problem as being taking the sum of the child’s and parent’s fingers. Note that because of the possibility of amputation or polydactyly, the subproblems are not the same – and, specifically, saying 2 * 10 = 20 isn’t generally correct.)

Essentially, the key idea behind dynamic programming (DP) is that we save time by not re-doing work that we’ve already done, by remembering the results to intermediate steps. Of course, this tends to mean that there’s a space overhead. This is generally useful in cases where a problem is too large to solve, yet it can be decomposed into smaller pieces, and importantly we can combine optimal solutions to these smaller pieces, to get a solution that is optimal for the original problem. (More formally, this is known as optimal substructure.)

Furthermore, we want to get some significant benefit out of actually remembering the answers (in practice, we want to use our previous work multiple times; this manifests in the form of overlapping subproblems). This is what would distinguish an approach as being a DP-based one, as opposed to divide and conquer.

Of course, the fingers example is trivial. There are many other natural examples (the ones that come to mind first for me include knapsack problems and route-planning), though I’m not sure I directly apply DP that much in a natural context (although quite a few days have tasklists that could be done solving an ordered constrained TSP, the last time I used the Held-Karp algorithm was probably for my third year project). It certainly does see many applications that are relevant to daily life (error correction in search queries / autocorrect via Levenshtein distance; not sure how they are actually implemented but routing applications like Citymapper and Google Maps are likely to involve such algorithms as well).

In terms of implementation, the cache-based “top-down” solution was what I learned first, and to me at least was intuitively easier to understand. When you encounter a subproblem, you check a lookup table to see if you’ve done the problem before; if you have, you just take the answer from that. If you haven’t, solve the problem the hard way (this may involve more subproblems – when solving these, it’s important to look at the table again), and then (important!) store the answer you obtained back in the table.

The alternative “bottom-up” method involves generating solutions to smaller subproblems, and using these to build up the solution to a bigger problem. I’d probably first actually used a method along these lines when introduced to the Fibonacci sequence (probably in year 4 or so) – I remember being asked to compute F_{13} and did something like “1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, uh… 144, 233”. (This is linear time. It’s possible to do better via matrix exponentiation, or Binet’s formula – bonus points if you pair the exponentiation with a fancy multiplication algorithm like Karatsuba or even Schonhage-Strassen.)

From a computing point of view there can be both benefits and disadvantages to this versus the “top-down” method.

  • Ease of understanding and/or code readability are likely to depend on the problem; for Fibonacci I would prefer bottom-up, but I usually find the top-down case to be more approachable (it’s more intuitive to me at least to think “here’s how I decompose this problem” as opposed to “here’s how I build a solution from smaller solutions”).
  • The top-down approach might be able to solve some problems without necessarily computing all smaller subproblems that a bottom-up solution counting up from 0 or 1 might deal with. You can, of course, implement this in a bottom-up solution… provided you know how to compute the required subproblems in a way that isn’t itself too costly. With a top-down approach you get this “avoidance” for free.
  • As an extension of the previous point: for bottom-up you’ll need to figure out a safe ordering to go through the subproblems (you can’t have a solution depending on something that hasn’t been computed yet). This is easy in most cases (*cough* Fibonacci), but can be extremely difficult in others (chess transposition tables come to mind; problems with online input, many base cases and a massive domain).
  • Recursive implementations (which tend to be top-down, though could plausibly be in either direction; it’s possible to maintain your own call stack on the heap, or pass some kind of lookup table around) incur the overhead of function calls, and can cause stack overflows for large problems.
  • Handling limited memory (there are many 2D array problems for which only the last row of results needs to be kept; alternatively with Fibonacci we only need the last two results) tends to be more naturally expressed with the bottom up method (though of course, you can clean the top-down cache). This is probably because you’ll have defined an order for solving the subproblems, which may not be as immediately clear with the top-down method.

Note that although this is a powerful tool, there are quite a number of cases where you don’t actually need to consider all of the ways of decomposing a problem into subproblems. A well-known example would be the activity selection problem; given a set of mutually exclusive activities with start and end times, find the largest set of activities I can participate in. I can solve this optimally by sorting events by their ending time, and aggressively picking events to fill my schedule where feasible. The key differentiator here is what’s known as the greedy choice property; that making an optimal choice at each step gives us the overall optimal solution.

In practice anyway it’s highly unlikely that I’d weight my activities equally, so we then get to the weighted activity selection problem, and the greedy method no longer works (but we can still use dynamic programming – as before, sort the activities by their ending time E, and for each activity, pick the better of not attending it, or attending it and behaving optimally before the start time of said activity).

Perverse Incentives

gametheory

I do watch a couple of Let’s Plays on YouTube, and one game that a couple of channels have been playing is Trivia Murder Party from the Jackbox Party Pack. I’ve found the format to be interesting and relatively well-designed for building tension, though there are a few weird edge cases because of how the rules are laid out.

The final round has rules as follows:

  • One player is in the lead (the player with the fewest number of spaces left to the finish line).
  • Players are presented with a category, and need to decide if objects presented fit that category or not. The player in the lead has two objects; other players have three (this is a rubberbanding measure used to encourage them to catch up).
  • For example, a category could be “Asian Countries” with objects Russia, Shanghai and the USA. In this case players should only select Russia (Shanghai is a city, and the USA is in North America).
  • For each object correctly classified, players advance one space on the board. No penalties are issued for objects that were incorrectly classified.
  • The player with the lead moves first. If he is across the finish line, he wins (independent of what non-leading players do).
  • In the event of a tie, the highest ranked player (based on the previous rounds) who is not in the lead takes the lead. Furthermore, all tying players apart from the new leader (including the old leader, if relevant) move back 1 space.

One strategy of playing the final round, and an arguably sensible one is to always try and answer the questions to the best of one’s ability. Yet, the aforementioned rubberbanding mechanism means this isn’t always optimal – suppose you’re playing against an incredibly knowledgeable opponent who implements the maximising strategy. Furthermore, suppose you’re in the lead, with 5 spaces to go before finishing, while your opponent has 7. Then, consider:

  • If you advance 2 spaces, you’ll have 3 spaces left to go whilst your opponent has 4. Then, no matter what you do, your opponent will take the lead on your next turn, and you’ll lose.
  • If you advance 1 space and your opponent advances 3, you’ll both be at 4 spaces to go. Because of the tie rule, he leads and you get pushed back one space. This brings you to (5, 4). Now, as a non-leader advance 3 spaces; no matter what happens you retake the lead. Finally, advance 2 spaces and win.

We can generalise this to solve the two-player case, by considering what is a winning position – in the case where both players have very strong trivia knowledge, but the opponent always advances the maximum number of spaces.

  • (1, n), n \geq 2 is clearly a winning position. Simply advance and exit.
  • (2, n), n \neq 1 is also winning for the same reasons (and (2, 1) is losing).
  • (3, n) is a bit more interesting. n \leq 2 is clearly lost. But (3, 4) is actually lost too, because that becomes (2, 1) or (3, 1) after one turn, both of which are losing positions.  (3, n), n \geq 5 and greater are won; advance to (1, n-3) and win.

Having considered a few cases, we can use what’s called induction to determine the rest of the winning positions. What this means is that we are defining winning positions in terms of whether other positions are winning. This can be dangerous if we produce circular definitions, but the relationships we use here are well-founded in that we maintain that the total number of spaces left to go for both players always decreases, for each of the sub-problems we look at.

We thus say that a position (p, q) is winning, if and only if any of the following are true:

  • p = 1, or p = 2, q \neq 1.
  • p > q + 1 and at least one of (p-3, q-2), (p-2, q-2) and (p-1, q-2) is winning.
  • p = q + 1 and at least one of (p-3, q-1), (p-2, q-2) and (p-1, q-2) is winning.
  • p = q - 1 and at least one of (p-1, q-3) and (p, q-3) is winning.
  • p = q - 2 and at least one of (p-2, q-3) and (p, q - 3) is winning.
  • p = q - 3 and at least one of (p-2, q-3), (p-1, q-3) and (p + 1, q-3) is winning.

We can derive a similar inductive rule for cases where one always maximises the number of answers: for each of the “at least one of” choices, always pick the first option. We can then consider when the (probably) “intended” strategy of trying to answer to the best of one’s knowledge fails to be optimal. In terms of implementation, the inductive definition above swiftly lends itself to a dynamic-programming implementation, whether “top-down” or “bottom-up”.

I’ve only considered the 2 player case because it’s relatively simple to analyse… things get substantially more complex once we go beyond that, especially factoring in the tiebreaking rules based on first-round performance. In practice, given that the questions can be difficult and unpredictable, at least for me it would probably be too risky to try to take a dive like this – but I can see this becoming relevant if expert players played this.

Sets of Sets

If I’m asked to think of a real-life example of a set of sets, I think the first example I would come up with was about job lots of items being sold. Interestingly, it actually took a while – the ideas I came up with before that, which were companies I have investments in (note: mutual funds and ETFs own sets of companies), or messages to my friends were really unions of multiple sets.

How would we model a set of sets in Java? Well, that’s not too hard, since a Set is a type after all:

Set<Set<Item>> items = Sets.newHashSet();
Set<Item> thingsBeingSold = Sets.newHashSet();
thingsBeingSold.add(Item.create("1 oz. Gold", 123456));
thingsBeingSold.add(Item.create("1 oz. Silver", 3456));
items.add(thingsBeingSold);

That said, we’ve got to be careful. If we continue the above with this:

Set<Set<Item>> items = Sets.newHashSet();
Set<Item> thingsBeingSold = Sets.newHashSet();
thingsBeingSold.add(Item.create("1 oz. Gold", 123456));
thingsBeingSold.add(Item.create("1 oz. Silver", 3456));
items.add(thingsBeingSold);
thingsBeingSold.add(Item.create("a chopstick", 20));
items.contains(thingsBeingSold);
items.stream().anyMatch(x -> x.equals(items));
items.stream().anyMatch(x -> x == items);

What should the output of lines 7, 8 and 9 be? Are they equivalent, even? Well, there are some possible arguments in each direction:

  • They should return true, because the reference to the set thingsBeingSold is still there.
  • They should return false, because the value of thingsBeingSold isn’t the same as the value inserted.

I would argue that the first point is stronger, and would have been my answer when I first learnt about Java. However, the actual answer is that lines 8 and 9 should return true while line 7 generally returns false, though there are a few special cases:

  • It will return true if thingsBeingSold.hashCode() before line 6 has the same hash as thingsBeingSold.hashCode() after line 6 i.e. there is a hash collision, or
  • if the Item created in line 6 equals either of the Items created in lines 3 and 4 (unlikely).

The operation of line 8 is relatively simple: we check each element in the set to see if it is equal, and assuming our set’s equals() method does what it’s supposed to do we will always find the match. We can potentially improve on this with parallelStream(). Line 9 does a reference-equality check on the elements instead and would probably be faster as we don’t have to actually inspect the set contents. In this case, it’s also going to be true. However, these are linear time operations, at least. The point of using hash-based structures here, among other things, would be to knock the time complexity of lookups down to amortised constant.

In a sense, we find the object only if we fortuitously (!) happen to be looking for the same hash code. Java HashSets effectively are a key-set based view of a HashMap, and when we look up objects we actually investigate them by their hashes first, only bothering to check the values if we have a match on the hashes. This allows for some fairly interesting optimisations, actually; I previously thought that HashMaps use buckets and store an array of linked lists of mappings + information, resizing appropriately when the ratio of elements to buckets exceeded a load factor. This was correct up to Java 7, but in Java 8 it appears there’s a fun new optimisation that builds red-black trees sorted by hash codes (while objects themselves might not be Comparable, their hashes are ints, which certainly are) if some buckets get too large relative to others in an attempt to cut the worst-case complexity from linear to logarithmic. Still, when we inspect nodes, we look at their hashes first.

Note that this isn’t just a problem with HashSets; you can get into similar debacles with TreeSets, because the underlying red-black trees aren’t going to reorder themselves. In general, the Javadoc for Set does warn you that mutable objects shouldn’t be changed in ways that would affect their equals() method after they’ve been put into a set. HashSet and TreeSet are allowed to behave the way they do because of subsequent invariants between equals() and hashCode(), and comparison-based sets are supposed to use comparators that are consistent with equals() i.e. a.equals(b) iff a.compareTo(b) == 0.

One way to get round this is of course to ensure that the types used in these sets are immutable, or at least the parts that are used in equals() and related methods are immutable. If that’s infeasible, a good solution could be to delete the set that is to be changed from the set, and add it back after changes; add is idempotent and so the set will (correctly) stay as a set. Wiring this up seems rather messy, though; it looks like we need to listen for state changes (so some kind of publish/subscribe mechanism), where the objects being stored publish both their old and new state so that we can modify the set accordingly.

A Software Engineer’s Perspective on Shenzhen I/O

I had some opportunity to have a bit of a break over the Christmas and new year period, and James recommended this title to me about a month ago or so, so I thought I’d give it a shot. It’s essentially a kind of competitive hardware programming game – you’re given a task to complete, and the challenge is to build a circuit exhibiting the specified behaviour. Of course, in practice this is wrapped in a pretty fun if frivolous story. To do this, you have to arrange various components on a circuit board, such as microcontrollers (which allow some form of assembly programming), bridges, logic gates, etc. The verification interface (pictured in the screenshot) definitely reminds me of Quartus II.

I did some x86-64 assembly programming during my first year at Imperial, and thus initially hit a bit of a snag (I kept typing in commas between operands, for a bit wondered why cmp, jne or shl weren’t valid commands, and also tried to perform arithmetic operations with both source and destination registers – you have to use an accumulator). It didn’t take too long to figure things out, though.

I’d say the first puzzle which got tricky was this one (there was an earlier one on using a greedy algorithm for coin change which might have been a bit tricky too):

  1. You will receive a signal on a non-blocking IO channel that consists of three packets. The first indicates the location to which you must move a robot using a motor, the second indicates the number of pulses to clean the plants at that location (?) and the third indicates the number of pulses to spray them with nutrition.
  2. To control the robot, you have to use its motor. Fire a signal of 100 for one pulse and then pull back to 50 to move the robot forward (increasing numbers); fire a signal of 0 for one pulse and then pull back to 50 to move it backward (decreasing).
  3. The clean and feed steps must be done for precisely the correct number of cycles.
  4. (Implicit) You may assume that there’s no overlap in signals (i.e. the robot will always have finished its feed steps before it gets another packet).

To add to the challenge, the microcontrollers have limited amounts of memory, and thus our programs have to be pretty short. As tasks grow more complex, we need to break them down into smaller pieces; we then end up needing to define protocols between the components. For example, for the above problem, the workflow is as follows:

  1. The “main” (first) module, upon receiving a packet, dispatches the target location to the second module and then blocks until said second module has sent a signal back.
  2. The second module compares the robot’s current and target locations, and submits commands to a third module (the one in the top right hand corner) to drive the robot to the target.
  3. The third module simply drives the motor to move the robot one step forward on a 0 input, and back on any other input.
  4. Once the robot is in the right location, the second module signals back to the first, unblocking it.
  5. The first module reads the rest of the packet and executes the clean and feed steps appropriately.

While this is of course different from what most software engineers will be doing in practice, I think it is not unreasonable to use this as part of teaching computer architecture and/or assembly (hello first-year Imperial students!). I think the game does teach a couple of useful ideas.

  1. Abstraction. This becomes critical for some of the larger problems. Quite a number of the tasks I’ve completed so far have involved developing protocols that use message passing between two or three modules.
  2. Optimisation can be painful. Most of the time the cleanest factoring is not going to give you the best score, to say the least; I think most users will experience some pain rewriting assembly to fit in fewer lines and/or run with fewer instructions.
  3. Complex conditional structures and gotos. You can use jumps, but those tend to be expensive; it thus becomes pretty common to use nested tests (teq, for example, checks if its two operands have equal value). Tests set a three-valued flag; instructions can be preceded by “+” or “-” to be run only if the flag is set to true or false respectively. Rather nastily, the flag can be set to neither as well (there is a tcp instruction that indeed writes the three-valued output to the flag!)
  4. A reminder to consider edge cases. For the Coin Change circuit, the test cases did include a case with exact change, as well as a case where only the largest coins were needed. I’m not sure if the robot test cases included anything too sneaky beyond no movement being required.

Highly recommended; in addition to all the fiddly circuitry there’s also a pretty cool card game that’s a strange hybrid of FreeCell and Spider Solitaire. I’d give this a 9 out of 10, easily. That said, I think people with no prior programming experience would find the learning curve very steep. The game does pretty well at introducing some useful constructs via suitably contrived tasks (there was one on the ROM module, one on the I/O expander, and one about using the digits of an integer as flags) to help mitigate that, though.

An Unexpected Economic Computation

Here’s a calculation that I found pretty surprising. Have a guess as to its meaning:

 \dfrac{\left( \dfrac{60}{(24-15)} \times 2.40 \right)}{1 - (0.4 + 0.02)} \approx 27.59

If you guessed something relating to savings, you’d be on the right track; the denominator would probably have clued you in to something along those lines, since the 0.4 + 0.02 is the marginal tax that higher rate earners in the UK would face (as of the 2016/17 tax year). You might have figured out then, that the fraction on top probably refers to some form of a time savings expressed in minutes. The 2.40 is probably a bit more puzzling especially if you’re not from London; from the previous observations it’s the cost of taking some convenience that saves nine minutes. That’s right; you might recognise that as the price of a single-trip Tube fare.

Putting that together, that computation is actually the effective hourly gross rate I’m being paid to walk to my office as opposed to taking the Tube. Specifically, I’m comparing (a) taking the Tube, which takes about 15 minutes plus a single trip fare of 2.40, and (b) walking, which takes 24 minutes. We’re thus looking at a net rate of 2.40/9 minutes. Of course, I pay for the Tube using post tax money, so we need to factor that in, and to get an hourly rate we scale that linearly.

Now of course this doesn’t mean I’ll never take the Tube to work again, or even that I’ll take it much less often – the 9 minute difference can be important (say there’s a high priority issue going on, or I’m about to be late for a meeting), and walking can be pretty unpleasant (if I’m very tired, it’s late, or it’s too cold; that said, I’ve done this in the current ~0 degree weather and found it fine, at least). Probably, doing this in the mornings would generally be more reasonable and sustainable as I’m pretty tired by the end of the day. Saving one trip even occasionally is enough of a win, I’d say, and getting some time to clear my head in the mornings is probably not too bad as well.

A question could be why I don’t use a Travelcard for the computations instead. The weekly one for zones 1-2 costs 33 and since I largely stay within zone 1 and 2 we can assume that that’s all I’ll need (I’ll ignore the odd trip to Heathrow). I think 16 or 17 is probably about the number of rides I’d take in a week if I used it aggressively (two a day, maybe three or four on the weekends). We can re-run the numbers with \frac{33}{16.5}, which means a trip costs 2.00. Our final answer is 22.99 which is still pretty solid (if you work 42 hours a week that’s a gross annual income of just over 50,000). Anyway, as it turns out because I use contactless, if I happen to have a week where I do travel a lot I’ll effectively be using one.

Now let’s have a look at the monthly or annual tickets. These cost 126.80 for monthly, or 1,320 for annual. Assuming that we make 2.357 trips a day on average (taking the middle of the estimates above), and a month as 365.25/12 days, with the monthly card the cost of a trip falls to 1.767 and with the annual card it falls to 1.533. The “gross wage” calculations become 20.31 and 17.62, which while not as high are still solid amounts. 17.62 an hour corresponds to about just over 38,000 in gross income assuming a 42 hour week, which would put you in about the 78th percentile of earners according to the ONS data for 2013-2014. I guess this will probably be a bit lower now, with inflation/wage growth, but still decent.

Assuming usage is high, it seems that the annual card might be the way to go. However, an issue is that I sometimes need to travel overseas for work potentially a fair chunk (let’s say personal plus work travel adds up to two months a year), so the annual one is quite definitely a non-starter. Multiply the price of a trip by 6/5 to factor that in, and you end up with a per-trip figure that’s higher than the monthly card. I have been getting the monthly card in the past, especially when I was a student and it only cost 86.50, partly because walking to Imperial was going to take a very long time (that route yields savings more on the order of 30 minutes). Note that while losing a card is a legitimate concern, you can usually recover the travel passes using the TfL website.

(N.B. This idea was conceived independent of the currently ongoing Tube strike, though in a sense it’s a bit of a bonus that things aren’t affected.)