Browse Category

Software

On Set Collecting

When I hear the word ‘set’, the first thing I think of is the computing data structure, and then the mathematical construct that that data structure is trying to model. Essentially, a set is an unordered collection of distinct objects. I’d probably then think of the Egyptian deity before thinking of the card game, or failing at a contract in bridge. That said, we recently had some interesting discussions about the card game over lunch at work.

The game consists of a deck of 81 distinct cards which each show a number of shapes. Each card has four attributes, and each of these attributes can take on three values – number of shapes (one, two or three), colour (red, purple or green), shading (shaded, striped or unshaded) and shape (diamond, rounded rectangle, ‘S’).

Initially, twelve cards are dealt to the table; players compete in real-time to identify a set of three cards where, for each attribute, the cards are either all the same or all different. The player collects these cards, and the dealer replenishes the cards on the table.

It is possible that no set is possible with twelve cards. In physical games, the dealer will, after a while, deal three more cards to the table if no one can identify a Set. We were playing on an app, though; this app waits for players to claim that there is no set available, and penalises players if they make an erroneous claim. More often than not, even though all of us were looking carefully at the board, we were too quick to declare no-set with 12 cards. However, we had one round where even with 15 cards (after one declaration of no-set), no one found anything. After a long while, someone pressed the no-set button, and indeed there wasn’t a set!

After the game, discussion turned towards finding the maximum number of cards where no set was possible. We quickly settled on 16 as a lower bound; this is possible if you only use two of the values for each of the four attributes (e.g. by throwing out all cards that have three symbols, are red, are shaded or are diamonds). Now it becomes impossible to perform matches along any attributes being different, since you need one occurrence of each of the three values; also, because cards are unique there is no set of three cards where all four attributes are the same (that would be three copies of the same card). This arrangement has 2^4 = 16 cards.

Going beyond this is tricky, mainly because the number of possible groups of cards you have to pick from is large. There are 81 ways of selecting the first card, and 80 ways of selecting the second. The third would have 78 ways, since there is one specific card that won’t go with the first and second. However, later on you can’t just deduct one card for each already selected card; for a simple counterexample, suppose I have already selected:

  • (1) one red shaded diamond
  • (2) one green shaded rectangle
  • (3) one green shaded S

Now suppose I try and add the card (4) with one red shaded rectangle. This means that the card with one purple shaded rectangle would now become ineligible for selection, because it would form a set with (2) and (4). However, we have already eliminated this card from consideration, because it would form a set anyway with (1) and (3).

I turned to writing a quick program here. This is a standard algorithm based on depth-first search.

  • Each card is represented as a vector with four components.
  • There is an initial set of 81 vectors created.
  • To guide our search, whenever we consider adding a card to a working set, we iterate over the elements of the existing working set and determine which cards would, together with the new card, create a set. This relies on an observation that for any pair of (distinct) cards, there is one and only one card that would form a set with the first two. We then prune these set-creating cards from the search tree.
private static void recursiveSearchForCapSet(Set<SetCard> possibleCards, Set<SetCard> workingSet) {
    if (possibleCards.isEmpty()) {
        if (workingSet.size() > best.size()) {
            System.out.println("New best found: " + workingSet.size() + "; " + workingSet);
            best = workingSet;
        }
        return;
    }

    for (SetCard candidate : possibleCards) {
        Set<SetCard> elimination = Sets.newHashSet();
        for (SetCard workingCard : workingSet) {
            SetCard newBadCard = computeLastElementOfSet(candidate, workingCard);
            elimination.add(newBadCard);
        }
        elimination.add(candidate);
        workingSet.add(candidate);
        recursiveSearchForCapSet(Sets.difference(possibleCards, elimination), workingSet);
        workingSet.remove(candidate);
    }
}

private static SetCard computeLastElementOfSet(SetCard first, SetCard second) {
    // XXX There's probably a cleaner way to compute 'all different'
    long targetP1 = first.p1 == second.p1 ? first.p1 : 3 - (first.p1 + second.p1);
    long targetP2 = first.p2 == second.p2 ? first.p2 : 3 - (first.p2 + second.p2);
    long targetP3 = first.p3 == second.p3 ? first.p3 : 3 - (first.p3 + second.p3);
    long targetP4 = first.p4 == second.p4 ? first.p4 : 3 - (first.p4 + second.p4);
    return new SetCard(targetP1, targetP2, targetP3, targetP4);
}

This found sets of size 17 quite quickly, but knowing that the bound was 20, I knew that wasn’t good enough. There were several implementation tricks that I used to help speed up the search.

  • This search is easily parallelisable; each branch of the search tree may be explored in parallel. A simple implementation (and what I used) could rely on Java’s parallel streams when doing the recursion, for instance. The orchestration around keeping track of the best answer needs a few small changes to be thread-safe, of course.
  • If the goal is merely to find some set of size 20, then we can exploit the symmetry of the problem and fix the first card to be any card (I used 0,0,0,0), since any cap set could be re-expressed as one that contains 0,0,0,0 by permuting the assignment of attributes to variable values. There may be more clever symmetry to exploit here than what I did.
  • Computing the last element of a set involves a bunch of jumps, branches and math; the result of this computation is deterministic, and can be cached so that it can be reused along different branches of the search tree. There’s probably something more clever I can do regarding overlapping sub-problems (this improves the runtime only by a constant factor).

This was sufficient to get the tool to spit out many sets of size 20, though I unfortunately didn’t churn through every possible selection, so this doesn’t show that 20 is an optimum.

Evolving Robots to Cut Stocks: A Coding #10YearChallenge

Ten years ago, I was in high school. I had some experience with coding, mostly through computer science lessons in school; I did Computer Science HL for the IB diploma programme, which had a sizable Java component. As part of the IB programme, students need to write an extended essay, which is basically a research paper.

I worked on Application of Evolutionary Algorithms to Solve the One-Dimensional Cutting Stock Problem. There are two terms to unpack here. First, “evolutionary programming” is a biologically inspired heuristic for solving optimisation problems. The idea is that we first randomly generate a pool solutions that are feasible. We then create ‘related’ solutions by permuting the existing solutions in some way, assess their quality using a function and then replace the old population by selecting randomly from the old and new, with bias towards better solutions. In this way, the theory is that the population of solutions increases in fitness/quality over time.

Second, the “cutting stock problem” is an operations research problem in manufacturing, where raw materials are supplied in finite sizes (e.g. rolls of paper) and one needs to cut them to fixed specifications. The assumption here is that it’s not possible to glue together parts that don’t correspond to any request to fulfill one – any pieces that are too small are lost (this is called ‘trim loss’).

In hindsight this seems decently ambitious for a high-school project. I did well, and got an A (which is the highest grade for these), though I remember how long it took me to write a (in hindsight absolutely terrible) working implementation.

Implementation

The project was implemented in Java, which the IB programme focuses on in its Computer Science course.

Looking back at how my code was structured, it was rather unsurprisingly amateurish and messy. There were just four source files for the entire project, including one 700-liner. That said, there was some semblance of organisation, at least in the sense of abstraction and factoring out logical units into functions. The main loop body of the runOneIteration() method I had was just four statements that ran the major steps of the evolutionary algorithm – clone the current generation, perform mutation, evaluate the new generation’s fitness, select the next generation.

My Java knowledge was poor at the time. Some of this was not knowing about the possibility of using external libraries – I had my own implementation of deep copying objects. Some of this wasn’t even external libraries; I had implementations of median-of-three quicksort and the Fisher-Yates shuffle, which seems unnecessary (these are Collections.sort() and shuffle() respectively).

I seem to have had near zero knowledge of the Collections APIs. I also used a lot of primitives and arrays of primitives, which made things hard to read. This could be argued as having better performance (by avoiding function calls, boxing when performing math and so on), but there wasn’t such a discussion in the paper or comments anywhere, so I’d chalk it up to lack of knowledge.

For an example, we can look at the implementation of a deep-copying routine I wrote back then.

public class DeepCopy {
    public static Object copy(Object a) {
        Object obj = null;
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(a);
            out.flush();
            out.close();

            ObjectInputStream in = new ObjectInputStream(
                    new ByteArrayInputStream(bos.toByteArray()));
            obj = in.readObject();
        }
        catch(IOException e) {
            e.printStackTrace();
        }
        catch(ClassNotFoundException e) {
            e.printStackTrace();
        }
        return obj;
    }
}

If I was to review this code, there were quite a few things I’d highlight:

  • I’m not sure Java serialization and deserialization is the best way to do this. The general serialize/deserialize approach makes sense, but there are more user-friendly and performant serialization libraries out there. I’d probably use Jackson now.
  • There are no tests! Amazingly I didn’t have any automated tests for my implementation at all. (Contrast with MCMAS, where one of the first things I did when working on that codebase was introducing an automated test framework.)
  • Swallowing exceptions and returning a potentially null object seems like an error-prone pattern. I’d prefer to have clients check for errors by catching (possibly unchecked) exceptions, rather than checking if the reference is null.
  • The ObjectOutputStream can be leaked if writeObject() or flush() throws. We should use something like try with resources (though at the time that didn’t exist – in that case we would put close() in a finally block).
  • The class is a utility class, but it has a public constructor (default in Java). It’s silly, but someone could instantiate instances of this class, which shouldn’t be allowed.
  • Using Object directly for the return type can be troublesome, especially since we know the object that’s being returned has the same type as the input.
  • The reason we have to assign null to obj on line 3 is because it may not be assigned when referenced on line 21. If we wanted to keep this contract (don’t throw, and return null on failures) we can avoid this by returning in.readObject() directly in 13 and null directly in the catch blocks, which is probably more readable.
  • This is not a fair criticism in 2008 (as the feature in question was introduced in Java 7), but nowadays if we want to handle different types of exceptions in the same way we should use the multi-catch syntax to be concise.

Algorithms

The cutting stock problem is difficult (it’s in FNP, and the decision version of “is there a better solution than some number of stocks” is NP-hard), hence heuristic-based algorithms like this seem reasonable. The mutations I used then were swaps, which probably makes sense for this problem as it preserves feasibility (the orders can’t be changed).

The algorithm was probably a bit too greedy in terms of how much it prioritised already-good solutions. I’m not sure if I had heard of the words simulated annealing then, but I interestingly seem to have suggested something like it when discussing possible future work! Those words weren’t mentioned anywhere in the essay and the paragraph in question has no references, so it might well have been a thought from first principles.

The approach of selecting stocks with high wastage to swap pieces from is initially effective in reducing trim loss; yet, this also caused the algorithm to converge at local optima frequently. Conversely, selecting stocks with low wastage to swap pieces from seems to encourage the algorithm to search for the global optimum, though this may require significant amounts of time. A conditional approach, more biased towards swapping stocks with more wastage at the beginning than at the end might prove effective.

Documentation

The obvious difference was that this was written in Word while nowadays I’d almost certainly use Latex for a document like this. Some of the formulas were awkwardly typeset, and rather frustratingly there wasn’t a consistent theme to the diagrams (I see Arial, Calibri, Times New Roman and LM Roman!)

There were some fair points that I made then; for example, the algorithm is flexible with resource constraints, in that terminating at any point will spit out a feasible solution, and much of the gain is obtained early on.

Conclusion

More generally, it’s good to see that things have changed, and that there has been real and significant improvement. I’m often frustrated about my learning and development as an engineer, and in the moment it’s often hard to see any improvement. Yet, viewed over a larger time-scale, across the Imperial degree (shout-out to Software Engineering Design), multiple internships and a battery of research papers and patentable work, things have improved quite a fair bit, which is somewhat gratifying.

Rebooting Ourselves

It was a fairly rough week. Thus, over this weekend I thought about re-examining and resetting various procedures and things I do, as opposed to actively filling my time with activities. This reminded me of a Humans of New York post I came across several years ago:

“I’m rebooting my life entirely, again. It’s time for Andrew 5.0.”

In computer science, semantic versioning is a system for identifying different versions of software products in a standardised way. Under this system, a product’s version is an ordered triple of nonnegative integers written major.minor.patch. This is typically used for software, though the definition does not seem to require it. The system discusses changes in terms of a public application programming interface (API), which specifies what functionality the product offers.

In terms of software, a SQL database’s API could include the types of queries that may be processed. For MCMAS-Dynamic, the public API would include the details of the modelling language it is able to verify properties for. A non-software example could include a simple kettle; the public API could include how one adds or removes liquid, how one turns it on or off, and possibly alarms or other feedback mechanisms for when the liquid has boiled.

When a new version of a product is released, the version number is increased (in lexicographic terms). How this increase is done depends on the types of changes since the previous version:

  • If the public API is ‘broken’, meaning that previously valid ways of using the API are no longer valid or accomplish different things, then the change requires a major version bump. To do this, the major version is incremented, and the minor and patch versions are reset to 0 (e.g. 7.5.1 \leadsto 8.0.0). For example, if the kettle used to play an alarm when it the liquid was boiled and this was a part of the public API, then the major version should be bumped if this functionality is removed. (In particular, if the API did not specify that the kettle had to play an alarm, this change might not warrant a major version bump.)
  • If new features are added without ‘breaking’ the API or there are non-trivial internal improvements, the change leads to a minor version bump. The minor version is incremented and the patch version is reset to 0 (e.g. 7.5.1 \leadsto 7.6.0). For example, if the new version of the kettle is substantially more energy-efficient, then that could be a minor version bump.
  • If something was broken and has been fixed (without changing the public API), then the patch version should be incremented (e.g. 7.5.1 \leadsto 7.5.2). For example, if the kettle previously rang an alarm twice when the liquid was boiled even though the kettle’s API specifies it should only ring once, then a change that makes the alarm only ring once could be part of a patch version bump.
  • Multiple changes in aggregate should be evaluated in aggregate. In most cases, the largest magnitude of all constituent changes applies, though generally speaking this is not true (consider one bugfix plus two changes, one which breaks the API and another that reverts that change – that is a patch bump, not a major bump).

Generally, making a more aggressive version bump than would be required for one’s change is acceptable, though it can confuse users. In particular, I tend to expect backward-incompatible changes when facing a major version bump; not finding any can be surprising and confusing.

The sentiment of the quote sounded like it was a major version bump. Defining an API for one’s life is obviously very difficult; even if one tries to use a lot of abstraction, I find that there are just too many facets. Rather loosely, our API might be split into a bunch of micro-services. We can treat physical needs and bodily functions like breathing and digestion as infrastructural. These services might then focus on the range of activities we involve ourselves in, or range of activities we could involve ourselves in. For me personally, this could include software engineering, getting along with other people, finance and budgeting, computer science, writing, puzzle solving and so on.

Hopefully, I would imagine that we chew through a lot of patch versions as we continue to improve skills. Today’s release notes could include “Jeremy knows a little bit more about thread pools” (I read a chapter of Java Performance Tuning today over a post-lunch coffee). Minor versions would also be relatively common; this wouldn’t be from today specifically, but “Jeremy can vaguely attempt a Balance Loop puzzle” is probably pretty recent, extending the sudoku and other puzzle-solving features.

Depending on how we define the API, major version bumps could be very common. It is typically important to be relatively disciplined with breaking changes in an API in a software engineering context, as clients may often depend on one’s product in non-obvious ways. While others’ dependencies on us can indeed be non-obvious, I think one factor that changes things is that our systems seem to be ephemeral whilst program code is not. A codebase left as it is over years or centuries retains its capabilities (admittedly, finding suitable infrastructure to run the product might be an issue).

On the other hand, there is some evidence that we lose skills that are underutilised with time. I used to play Dance Dance Revolution quite a lot and could probably pass an arbitrary level 15 song as well as some 17s; I doubt I can do that today as I haven’t played in a few years. The ways we interact with others or manage our finances may change as our personal environments change as well; for example, if I moved away from the UK, I would not be able to allocate my investments the way I do now, because I would lose the ability to use ISAs (and probably most other forms of UK-specific tax-free savings). This may even happen without action (for example, if the UK government changes how ISAs or tax-free savings work) – though you could argue that declaring the use of specific vehicles in one’s API might be too specific and implementation-dependent (“I will use tax-advantaged accounts that are valid in my location appropriately” is maybe better).

In light of the above, I would be a bit laxer with what constituted a ‘breaking change’, which pulls things back toward subjectivity which I think semantic versioning was trying to avoid. I might regard myself as having major version 2 right now; I could consider everything up to and including my second year at Imperial as version 0, which is typically used in development to refer to a pre-release period of rapid iteration. Although National Service and/or moving to the UK for studies did bring about nontrivial changes, I really didn’t know what I wanted to do at that time (not that I know now, but there is at least a vague direction).

The Google internship was probably the turning point for version 1; that also coincided with several major changes with regard to finance, investment, philosophy and priorities. I’d call the second major change to be when graduating from Imperial and starting at Palantir; even then, I’d regard the first set of changes to be more fundamental. The re-examination I did over the weekend is actually probably a patch release (or maybe a minor that improves several non-functional characteristics); it certainly doesn’t warrant a major version bump.

Static Single Assignments (January Haskell Tests 2018 – Part 2)

Second in a two-part series; see Constant Propagation (January Haskell Tests 2018 Part 1) for the first part, which covered the assessed portion of the test.

In January this year, Imperial College’s first-year computing students were tasked with implementing a tool to handle constant propagation for statements in static single assignment (SSA) form. In particular, this means that each variable is only assigned to once, and there are no compound expressions. This is useful for optimising compilers to simplify the machine code that a program compiles down to.

Of course, being required to write everything in SSA form is painful; as much as I like immutable objects, mandating their use can be costly (consider a large set, for instance, that periodically receives updates). Banning compound statements can also be troublesome; hence, SSA compilation is typically automated.

The test concluded with a challenge tasking students to implement support for converting a more general program into SSA form. The students had been working with a simple imperative language with statements to assign values to integer variables (possibly doing mathematical operations), conditional statements (if-then-else) and do-while loops.

This problem naturally decomposes into three sub-tasks: supporting SSA translation for basic blocks, conditionals and then loops. The first problem can also readily be split into two sub-problems: ensuring variables are immutable and then handling compound expressions.

Basic Blocks

Immutable Variables

I opted to begin with immutability here, as this would allow for easy testing. This is done by creating multiple versions of a variable. For example,

a = 10;
b = a + 2;
b = a + b + 8;
c = a + b;
a = c + 3;

could become

a1 = 10;
b1 = a1 + 2;
b2 = a1 + b1 + 8;
c1 = a1 + b2;
a2 = c1 + 3;

Essentially, one could implement this by keeping track of the latest version of each variable used, and incrementing the version whenever processing an assignment. Whenever handling the right-hand side of an assignment, one needs to be able to know how to reference the latest version of a variable.

-- Usage Map for handling new instances of variables
type UsageMap = [(Id, Int)]

-- Retrieves a fresh identifier for the given raw variable.
getIdentifier :: Id -> UsageMap -> (Id, Int, UsageMap)
getIdentifier "$return" map
  = ("$return", 0, map)
getIdentifier id []
  = (id ++ "0", 0, [(id, 1)])
getIdentifier id (binding:bindings)
  | id == bindingId 
    = (id ++ show(bindingCount), bindingCount, (id, bindingCount + 1):bindings)
  | otherwise       
    = (returnedId, returnedCount, binding:returnedBindings)
  where
    (bindingId, bindingCount) = binding
    (returnedId, returnedCount, returnedBindings) = getIdentifier id bindings

I’m pretty sure I wouldn’t have bothered to write a specific UsageMap type when I was still in university, though working in industry seems to have made me think more carefully about typing.

Compound Statements

The version of SSA in the paper also doesn’t allow compound statements (e.g. a = ((b * c) + 5) / d). However, this could be handled reasonably as arithmetic expressions can be represented in tree form.

We can create temporary variables for each non-leaf node of the expression tree. For example, we define k1 = b * c, k2 = k1 + 5 and k3 = k2 / d. Some care is needed in cases where the tree is a bit more balanced (we need to know that the first argument of our plus above is k1).

Also, of course, k is only used for illustration above, as real functions could have a value actually called k! It makes sense to have a counter separate from the rest of the UsageMap, though I relied on some limits on the variable names that were mentioned in the specification. I’m not sure if I would have worked this out in first year!

Conditional Expressions

One needs to consider which versions of a variable are in scope. Let’s consider the following block:

a = 7
b = a * 3
if (b > 20) {
  a = b + 3
  b = 0
} else {
  a = b - 3
  b = 1
}
return a + b

After reaching the end of the if block, we can’t just use the UsageMap we have obtained from that to pass into the else block (since on an execution where we’re going in to the else block, the if block must never have run). Similarly, after reaching the end of else we can’t just use the UsageMap from that (we don’t know if the else block has run).

It is probably easier to visualise this block of code as a tree. Each node represents a possible linear execution path that we may run.

The versions of variables each of the blocks read upon branching should be the version before we entered that block. Thus, we should reference the state of the UsageMap from before we entered the if block when entering the else block. We actually want something like this.

When returning, we somehow need to pick one of the final values for variables that were possibly modified. This is difficult to express, but we can approximate it with a phi function, where \phi(a, b) nondeterministically returns either a or b. We can thus simply pick a4 = \phi(a2, a3) and b4 = \phi(b2, b3), and add these statements. Finally, we would want to return a4 + b4. Some care needs to be taken with the implementation (for example, one must handle the case where a variable is only possibly modified correctly). This should, for the above block of code, yield:

a1 = 7 
b1 = a1 * 3 
if (b1 > 20) { 
  a2 = b1 + 3
  b2 = 0
} else { 
  a3 = b1 - 3
  b3 = 1 
} 
a4 = PHI(a2, a3)
b4 = PHI(b2, b3)
return a4 + b4

During the implementation I was running out of time, which unfortunately led to this monster of a helper function. It could probably be cleaned up by perhaps making a type for ‘return values of a traversal’. It’s also probably reasonable to do some kind of higher-order function on the mod-set of variables (which I computed elsewhere).

-- Compute values to be used for PHI for one variable at a time
plantIfPhi :: UsageMap -> UsageMap -> UsageMap -> UsageMap -> [Id] -> (Block, UsageMap)
plantIfPhi original afterIf afterElse final []
  = ([], final)
plantIfPhi original afterIf afterElse accumulator (id:ids)
  = ([Assign newId (Phi (Var varAfterIf) (Var varAfterElse))] ++ otherPhis,
     final)
  where
    oldVersion = lookupWithFailure id original
    versionAfterIf = lookupWithFailure id afterIf
    versionAfterElse = lookupWithFailure id afterElse
    realVersionAfterElse 
      = if versionAfterIf == versionAfterElse
        then oldVersion
        else versionAfterElse
    varAfterIf = id ++ show(versionAfterIf)
    varAfterElse = id ++ show(realVersionAfterElse)
    (newId, _, newMap) = getIdentifier id accumulator
    (otherPhis, final) 
      = plantIfPhi original afterIf afterElse newMap ids

Do-While Loops

Finally we also need to support loops. We could use a call graph structure, similar to what we had for the if-then-else conditional, except that there is a cycle.

a = a + 2
b = 0
do {
  a = a + 1
  b = 5
} while (a < 10)

At first glance, this looks tricky. We don’t know the number of times the loop will execute, though we do know it’s at least once.

Which value of a are we reading in the loop body? Well, this is either the value from the initial set-up, or the value the previous loop left at its end. We could thus try to introduce phi-functions:

a2 = a1 + 2
b1 = 0
do {
  a3 = PHI(a2, a4)
  b2 = PHI(b1, b3)
  a4 = a3 + 1
  b3 = 5
} while (a4 < 10)

This program is impossible to execute, of course; the definitions are cyclic. However, we’re not actually trying to execute it. We’re only using this as an intermediate representation to try to statically optimise our code. Of course, if the two arguments to a phi-function are the same, we can eliminate it, mark that value as a constant if known, and then use the information about it being a constant to drive further optimisations.

Synthesis

I think this could be a fairly challenging question in and of itself for first-year computing students, though some degree of hand-holding would likely be required (in particular, the tree traversal for splitting out intermediate assignments can be quite fiddly to work with). It would also be worth introducing the concept of basic blocks early on, as reasoning about them is more useful in handling conditions and loops. It was a rather solid and satisfying conclusion to the original 2017 test.

Constant Propagation (January Haskell Tests 2018 – Part 1)

The latest iteration of the January Haskell Tests is out (see Dr Tony Field’s page, and my post on the 2017 test). I think the last time I wrote any Haskell was eleven months ago when taking the 2017 test for a spin; nonetheless I was able to pick things up quickly. I guess the syntax (outside of monads) is pretty intuitive.

This test in some ways felt a fair bit easier than the last, at least in terms of the assessed portion (the last time round, it took me up to just past the two-hour mark to clear that; this time round, I was done with the questions that were actually worth marks by 1:15). However, the ‘bonus’ unassessed Part 4 seemed hard; finishing it in time looked to be a pretty tall order. Part 4 itself could be decomposed into three fairly chunky parts, and could even be a reasonable basis for a separate test!

The test dealt with optimising code that was already in static single assignment (SSA) form. I’d covered this topic before in a course on software reliability (within the context of static program verification), though I don’t remember covering it within the context of (possibly optimising) compilers. SSA form is fairly restrictive:

  • Each variable can only be assigned to once.
  • Operator arguments can only comprise constants and direct variable references.

For example, the assignment  x = 4 * y + 7 * z is clearly not in SSA form; but it can be rewritten as  a = 4*y; b = 7*z; x = a + b where each statement is. In most programming languages variables can be assigned to multiple times; as part of conversion, we tend to add suffixes to variables for versioning. A more involved example follows:

long notSsa(x, y) {
  a = x + 100;
  b = 2 * y + 3; // compound expression
  b = b + x; // can't reassign!
  return a + b;
}

long ssaVersion(x, y) {
  a0 = x + 100;
  k0 = 2 * y;
  b0 = k0 + 3;
  b1 = b0 + x;
  return a0 + b1;
}

SSA also allows for the concept of a phi function, which has two expressions, and non-deterministically returns one or the other. This is useful for performing optimisations regarding control flow. For example, consider this function:

long veracity(x):
  z = 1;
  y = 5;
  t = 0;
  if (x > 0) {
    z = 2;
    t = z + y;
  } else {
    z = 3;
    t = (2 * y) - z;
  }
  return x + t;
}

We can rewrite this using a phi function; after an if-block, the final value of a variable could be the value at the end of either branch.

long veracity(x):
  z0 = 1;
  y0 = 5;
  t0 = 0;
  if (x0 > 0) {
    z1 = 2;
    t1 = z1 + y0;
  } else {
    z2 = 3;
    k0 = 2 * y1;
    t2 = k0 - z2;
  }
  z3 = PHI(z1, z2);
  t3 = PHI(t1, t2);
  return x0 + t3;
}

Although one had to work with phi functions as early as Part 2 (the function written at the end of part 2 should be able to simplify the above function significantly), thinking about the purpose of phi functions certainly wasn’t important until Part 4.

Part 1 involved writing an interpreter to execute statements in a (very) simple subset of C, where code has already been parsed into tokens. I think this was intended to set a baseline for students who were struggling with the material; I knocked through the 14 marks worth (which is enough for a pass; that’s 47%) in just 17 minutes.

Things got somewhat tougher in part 2. The opening was easy, with two 3-mark questions inviting candidates to implement simple constant folding optimisations (in particular, that  0 + x = x and \phi(x, x) = x , along with calculating out results) and then determining the value of an expression given knowledge of some variable values.

The tricky part then began, with a task to implement constant propagation and folding to actual code blocks in SSA form. Conceptually, this is not difficult – once the value of a variable is known, it can be substituted into all subsequent occurrences. A fix-point approach fits naturally, since knowing something about some variables can lead to knowledge about more variables. Nonetheless, I struggled for a while with implementing the suggested worklist algorithm; the aforementioned fix-point approach was suggested too, and would probably have been more elegant, if harder to implement. I finished this part at 1:01.

Part 3 involved removing phi functions if it was not possible to clean them up as part of the SSA optimisation, by pushing the variable assignments back into code. It seems a decision was made to state the rules for these explicitly in the specification, for some reason (to be fair, as Tony notes it is straightforward), which seemed to make for a pretty tame last assessed part. There was some tricky thinking required here, nonetheless; I think one has to be very careful with handling multiple phis!

Finally, Part 4 tasked students with translating a function to SSA form. This itself is weighty enough that I think another test could easily be written on it. This part can be decomposed into three (maybe four) separate challenges:

  1. Translate basic blocks to SSA form correctly.
    1. Maintain correct information about versioning of variables.
    2. Decompose compound statements (like x = 3 * y + 5 * z).
  2. Add support for conditionals.
  3. Add support for loops (do-while was a part of the simple C subset used).

Squeezing all of that into the remainder of the time after finishing the first three parts would have been tough; I finished the first two full tasks (basic blocks at 2:07 and conditionals at 2:46) and squeezed out a comment on what to do for the last sub-task. The last part of task 2 was also written rather hackily as I was racing the clock. For a quick preview:

lookupWithFailure :: Id -> UsageMap -> Int
lookupWithFailure id map
  = fromMaybe (0) (lookup id map) - 1 -- TODO hack

To be fair, I could probably have churned all three main and three bonus parts out in Java within the time-limit; it’s nonetheless useful to engage a different set of programming skills.

Last year, I wrote that I enjoyed revisiting Haskell; I enjoyed this year’s task too. I think I actually enjoyed this installment more, probably because of the outsize difficulty in Part 4! Although I haven’t written professional Haskell, I often find myself writing functional-style code in Java, and I’d still give it an enthusiastic thumbs-up for a choice of first language to teach at university. There are some influences from professional development coming in here, too; a year ago I probably wouldn’t have batted an eye at using types like a Haskell [(Str, [Int])] or Java Map<String, List<Integer>>, while I’ll tend to give structures types more quickly. My tolerance for long functions (without good reason, in non performance-critical scenarios) has also certainly decreased.

This post is Part 1… I’ll take a more in depth look at Part 4 of the exam in Part 2.

Balloon Battle (Hotseat: UKIEPC 2017, Part 1)

Overview

The United Kingdom and Ireland Programming Contest (UKIEPC) 2017 took place a few weeks ago. I used to participate semi-actively and had a rather annoyingly consistent habit of finishing in third place at Imperial. I haven’t done contest programming for quite a good amount of time, though some of my experience from interviewing candidates at Palantir might have helped.

Teams are given five hours to solve somewhere from eleven to thirteen problems. These problems involve submitting a program (the source will do) to an online judge, which typically runs said program against multiple test cases and evaluates the output of the user’s program in some way. This can entail simply checking strings, or alternatively checking if the given solution achieves some task (consider problem D in this set).

This time round I managed to figure out 9 problems, rather coincidentally with a total “penalty time” of 999 minutes. When teams are ranked, the number of problems solved is compared first; to break ties, teams are given penalty time, where solving a problem T minutes since the start of the contest, with F failed attempts, scores T + 20F penalty time. The penalty for failed attempts does not accrue until the relevant problem is actually solved. It’s still in one’s interest to finish solving any questions – even though it may give you arbitrary amounts of penalty time, it will not hurt your ranking. This score would have placed second at Imperial (maybe because some of the members of the usual team number 2 have graduated?) and, at time of writing, 51st out of 522 overall.

Problems

This half of my write-up (I intend to discuss this in two parts) covers what I think are the easier seven problems in the contest, in the order I tackled them. Looking at the metadata on Codeforces, there is a pretty clear drop-off after this. Interestingly, F was actually considered the hardest problem of the ones I’ve listed here (in that fewest teams solved it), but I was able to place it third in difficulty; I struggled more with D than some of the later ones, even, though mainly for implementation reasons.

C: Cued Up

Wrong answer at 9 minutes; correct answer at 18 minutes (10 minutes spent total).

Given a description of the number of snooker balls still on a table, determine the maximum number of possible points remaining to be scored. Initially any ball can be hit (it isn’t known whether the red or coloured balls are “on”), but thereafter players can’t pot 2 consecutive reds, and if a non-red is potted and reds remain on the table, the next ball potted must be red. Assume that only one ball is potted per shot.

This is mainly a straightforward implementation – the main snag is that if you have only red balls, then regardless of how many there are the right answer is 1. I think I messed up the first time because I fixed this edge case (which is the last of the sample tests), but accidentally ended up printing “1” before the correct answer on all of the other tests…

I: I Work All Day

Correct answer at 13 minutes (4 minutes spent).

This was basically decoding the problem statement and figuring out that “what’s left over after repeated subtraction” refers to the modulo operator. I probably could have got this one out faster if I was still comfortable with C++.

J: Just a Minim

Correct answer at 17 minutes (4 minutes spent).

Figure out the length of a tune given the length of note values. Again, raw implementation. I’m surprised this even took 4 minutes, though that might well have been because I went through all the sample tests, and also did the “hacky” approach of hardcoding the strings. There was a slightly faster to write up solution that involved special casing 0, and then writing 1/x for the others. I also might have wasted some time because I was concerned about floating point rounding errors and so calculated things in terms of sixteenths. (That’s not needed because these numbers have exact binary representations, and the problem limits would be very unlikely to result in a loss of precision.)

F: Flipping Coins

Correct answer at 26 minutes (8 minutes spent).

You have n coins, all initially tails up, and k coin flips which you are obliged to exercise – but you can choose which coin to flip. At the end of the game you keep all coins that are heads up. What’s the expected number of coins you can keep (with optimal play)?

This one is figuring out the right strategy (pretty simple – always flip something that’s tail side up if you can) and then writing a recurrence relation. There are overlapping subproblems, so a quick bit of memoisation is needed, but it is not hard to implement. I’m surprised how much candidates appeared to struggle with this problem. I guess it requires a little bit more familiarity with thinking in terms of probabilities, which has tended to be one of my stronger points at contests like these.

D: Deranging Hat

Wrong answers at 50, 53, 61 minutes. Correct answer at 89 minutes (63 minutes total spent).

A sorting network consists of comparators; given two indices i and j (importantly, in that order), if i is greater than j, then the letters at those indices are swapped. Given a string, build the sorting network that would “sort” the traditionally sorted version of that string back into the original string. For example, given the string “dude”, you would need to transform “ddeu” to “dude”, perhaps by comparing 4 and 2 and swapping them, and then 3 and 4 (and then swapping them). The sorting network can’t be more than ~10X the length of the string, in the worst case.

It took me quite a while to figure out what the problem statement meant. I came up with a reasonable idea (swap each letter into its right place, iterating through the string once), but was needlessly worried about performance and started doing a lot of fiddly book-keeping with Maps of SortedSets; there was a difficult-to-trace bug in there. I then gave up some time after 61 minutes and reimplemented things with a much simpler temporary char-array solution, which worked.

A: Alien Sunset

Correct answer at 104 minutes (15 minutes spent).

Given day lengths on multiple planets and times of sunrise/sunset, find the first time within a given time window where it is dark everywhere.

I avoided this at first because it looked like an Interval Tree kind of problem. Though seeing that many people solved it, I had a closer look and it looked like writing something in O(planets * allowed time window) would work. Thus, simply start from 1 and test each possible time interval to see if it is indeed dark everywhere, and if it is return. There was a bit of a snag with some planets having daylight hours crossing zero, but nothing too difficult to manage.

E: Education

Correct answer at 140 minutes (36 minutes spent).

A school has N departments, each with some number of students Si; they need to move to a new set of buildings, which each have capacity Ti and cost Ci (but they cannot share buildings). Minimise the total cost of all buildings, while fitting all of the students in (if possible).

I figured to sort the departments in descending order of size and then thought of some kind of dynamic programming based solution. I then realised I could do better; a greedy solution which picked the cheapest available building could actually work, if one was considering departments in descending order of size. This led rather quickly to one possible solution: sort the departments in descending order of size, and then populate a priority queue (highest priority = lowest cost) with permissible buildings. As larger departments are housed, smaller buildings can be added to the permissible pool, though we still want to ensure that large but cheap buildings would be used first. If the queue ever ran dry, the assignment would be impossible.

Five problems remain that are somewhat more difficult (at least in my opinion).

Thoughts

There are certain skills that I’m still reasonably comfortable with, like mathematics in terms of computation. There are others like computational geometry that I don’t remember too well (as we’ll see in part 2). The first part of these contests is typically a speedy rush to get credit for as many problems as possible; usually somewhere between the 5 and 7 problem mark things slow down a bit. The second part for me tends to be more interesting, though I find that the focus changes from speed to actually thinking hard about algorithms – and very often fretting over implementation as well.

Readability and Performance

I find myself faced with tradeoffs between readability and performance fairly often. This happens both in terms of programming (where, apart from in some specific situations, the former usually isn’t too bad) and in English.

In programming, code that is readable is usually desirable. This often makes it easier to maintain, especially for other developers who subsequently need to understand it. For example, let’s suppose we have a Java hash map, and we need to return an Optional indicating the value with a key that matches a given Predicate, if it exists. However, if there are multiple keys matching the Predicate, we want to throw an exception (even if the values are the same).

I would probably write something like this:

private static <K, V> Optional<V> getMatchingValue(Map<K, V> data,
                                                   Predicate<K> keyMatchPredicate) {
  List<V> matchingValues = data.stream()
                               .filter(keyMatchPredicate)
                               .limit(2L) // 2 because we need to catch multiples
                               .collect(Collectors.toList());
  if (matchingValues.isEmpty()) {
    return Optional.absent();
  }
  if (matchingValues.size() == 1) {
    return Optional.of(matchingValues.get(0));
  }
  throw new IllegalArgumentException("Map had multiple matching keys.");
}

However, in terms of performance we could clearly improve. Using limit does allow us to shortcircuit (so we don’t look at more elements than we need to). However, creating stream objects could still add significant overhead, especially if the predicate is simple.

The code below should accomplish the same function, but more quickly.

private static <K, V> Optional<V> getMatchingValue(Map<K, V> data,
                                                   Predicate<K> keyMatchPredicate) {
  Optional<V> value = Optional.empty();
  for (Map.Entry<K, V> entry : data.entrySet()) {
    if (keyMatchPredicate.test(entry.getKey()) {
      if (value.isPresent()) {
        throw new IllegalArgumentException("Map had multiple matching keys.");
      }
      value = Optional.of(entry.getValue());
    }
  }
  return value;
}

This is still fairly readable, in my opinion, though there is a bit more state to keep track of. However, if the predicate is costly, we could do even better by taking our first approach and parallelising it, essentially framing the problem as a map-reduction. (The end result is likely to be slightly less readable. It might be substantially less so for people unfamiliar with concurrency).

Of course, it is not always the case that readability and performance are negatively correlated. Optimising compilers implement many techniques that transform a program with relatively readable source code into one that is efficient for a machine. This is often true, though, when an entirely different (and usually more complex) algorithm and/or implementation is selected because of performance.

It goes without saying that readability is also subjective (tabs versus spaces is a starting point). Readability does not always imply maintainability as well. For example, deeply nested structures where each layer focuses on one specific functionality may be readable, but may prove less maintainable given that they may produce longer, more convoluted stack-traces when exceptions occur.

Defining “performance” in natural languages is tricky. We could consider performance to be efficiency of communication, assuming a perfect ability to comprehend. Well-written compilers implement the source language specification faithfully. They should thus be able to understand valid source, even if it’s not in a human readable form. This happens in practice; processes like JavaScript minification are commonplace.

Of course, English does not quite have a specification. Nonetheless, I think efficiency under perfect comprehension is probably a reasonably close analogue to software performance. For example, if I was asked to explain the logical argument rule of modus tollens, I would probably say something like:

Suppose we have if A, then B. Suppose not B. Then, we have not A (since if we had A, then we would have B).

That explanation would probably make sense to readers vaguely familiar with formal logic and/or philosophy. Again, I could conceivably further increase “efficiency”, by removing the bracketed part. I could even boil things down further:

If A, then B. Not B. Thus not A.

Of course, if the person I was talking to was unfamiliar with logic and, let’s say, a friend, I would proceed very differently. This would, in a sense, be prioritising readability:

It’s a way of making an argument in logic. The idea is that if we know that if A is true, then B is true, and we know B is false, then we can say that A must also be false. Let’s use a concrete example. Let’s say that if it rains, I’ll wear a hoodie, and I’m not wearing a hoodie. Deducing that it must not be raining would be an example of modus tollens.

That said, there is also plenty of subjectivity in terms of readability for natural languages. There have been many attempts to compute estimates or measures of readability. These often use formulae that accept word length and syllable distributions and/or lookup tables. However, many of these fail to account for, among other things, the level of abstraction in text. Going back to our three explanations of modus tollens, they have approximate Flesch-Kincaid grade levels of 0.4, -2.6 and 5.5 respectively. However, I’m fairly confident that most three or four year olds (grade -2.6) would not quite understand the concept of modus tollens after reading “if A then B, not B, thus not A”. Even if 11 to 12 year olds (grade 5.5) were split into groups and given the three explanations, I suspect that the third would still be the most likely to help the students understand the concept.

I actually find the second explanation the most readable, owing to domain knowledge. In just nine words, it gives me a precise and accurate description of what the logical rule is. It might be appropriate for a general audience. It probably would not be for an audience familiar with neither logic nor philosophy.

The first is good in that it explains why not A is reasonable. I could see myself appreciating it if the subject matter was somewhat more complex (e.g. results in modal and temporal logic).

The third features a concrete example, which is useful when learning. It also explicitly sets out some assumptions common in logic, such as “suppose X” meaning “suppose X is true“. However, I find it to be less readable, as I need to read a longer text to extract the main conclusion of the paragraph.

Earlier, I highlighted that readability and maintainability of software and source code are not one and the same. Let’s consider the notion of maintainability. If spoken (especially extemporaneously), I’m not sure this is necessary at all. If written, that could refer to the difficulty of proofreading or returning to the work when trying to revise it, perhaps?

In the software case, there are times when my hand has been forced and I’ve thus opted for a complicated, less readable implementation. Outside of word games like Taboo and exams or application processes with strict word limits, I don’t think I’ve had to for English. Thus, in general I would consider my target audience and aim to present it in what I expect would be the most efficient way possible, while still retaining the required readability.

Portmanteau (Spliterator)

For the most part, classes in the standard Java libraries consist of fairly normal words (e.g. a ScheduledExecutorService is an ExecutorService – a service that manages the execution of tasks – which allows for scheduling). There are exceptions (and I don’t mean the Java ones), of course, such as the OMGVMCID.

Java 8, specifically JSR-335 brought about a bunch of new classes, including some of my Guava favourites (the function package, with Supplier and friends, and Optional) and Streams. Most of these, again, have fairly normal-sounding names; there are a few that might sound a bit hacked together like BiConsumer but I think the meaning is generally intuitive (a consumer of two arguments). There is one that was initially fairly puzzling, though – “Spliterator” (rendered as one word). The obvious reading of that is as a portmanteau of “split” and “iterator”, but I wasn’t sure at first how useful it would be. Of course I have written code to partition collections or other iterables, but this was usually done on an ad-hoc basis.

The API seems designed around easily scheduling parallel computation. The key method here is trySplit(). To quote the Java API docs:

If this spliterator can be partitioned, returns a Spliterator covering elements, that will, upon return from this method, not be covered by this Spliterator.

I’ll go over this in a bit more detail.

  • If the spliterator can currently be split, then trySplit() returns a new spliterator, and this “takes over” some of the elements in our existing spliterator. Our existing spliterator, if traversed directly after this method returns (e.g. with tryAdvance()), will not have these elements.
  • Otherwise, trySplit() returns null. Our existing spliterator retains its elements.

For example, this is one way of doing a parallel file deletion, which starts from one spliterator and splits it as necessary.

// Assume there exists an ExecutorService, EXECUTOR.

class DeleteTask implements Runnable {
  private final Spliterator<File> filesToDelete;
  
  DeleteTask(Spliterator<File> filesToDelete) { 
    this.filesToDelete = filesToDelete; 
  }

  @Override
  public void run() {
    Spliterator<T> filesToDelegate = filesToDelete.trySplit();
    Future<T> delegateFuture;
    while (filesToDelegate != null) {
      delegateFuture = EXECUTOR.submit(new DeleteTask(filesToDelegate));
      filesToDelegate = filesToDelete.trySplit();
    }
    filesToDelete.forEachRemaining(File::delete);
    delegateFuture.get(); // we can use a fancier way of managing completions if needed
  }
}

Viewing this in relation to parallel programming, there are obvious similarities to fork-join based workflows, especially in cases where the work involves a map or filter, or doing some reduction that’s commutative and associative. A traditional way of writing the above wouldn’t be too different; you might write some logic that splits the files into batches (which now lives in the spliterator), and then similarly dispatch deletion jobs to an ExecutorService.

From the above, it seems that a one notable difference is where responsibility for how the stream should be split lives. The Spliterator has some responsibility for deciding whether it should split itself. To some extent this makes sense; in cases where we want batches to have rather specific sizes (e.g. if performance is highly sensitive to elements being processed in specific batch sizes), we can guard trySplit() with suitable calls to the estimateSize() or getExactSizeIfKnown() methods. This also can be useful for allowing us to avoid fiddly edge cases (where our size bounds pass, but we must work with less than a full batch of elements, for instance).

Spliterators are also useful because they include characteristics that dictate, among other things, how the elements are to be traversed. For example (though this doesn’t quite apply to the file deletion example) we may be able to use a certain optimisation if we can assume that we will see the data in sorted order; spliterators do carry this information (whereas if we used a standard Iterator we might not be able to easily do this without some additional bookkeeping). The same might apply for cases where we can do better if we know elements are unique. The collections API also generates spliterators with the correct characteristics, in a sense (e.g. a NavigableSet would give you a spliterator that already has the sorted and unique properties present).

I’m still not sure why this name was decided upon, as opposed to, say, SplittableIterator. Perhaps this was done in the interest of terseness and because the decomposition is fairly obvious, though the name still really doesn’t look like a proper word to me. It does seem that spliterators have use cases beyond standard iterators as well, even when parallel computation is out of the picture – the characteristics and ability to dynamically switch on them seems potentially useful. That said, I still have not used a Spliterator directly in my professional work, and have no intention of doing so unless it fits the task I’m doing!

Iterative Dichotomiser (January Haskell Tests 2017)

I probably can’t really explain this much better than the source:

The January tests are on-line Haskell programming tests sat under examination conditions by first-year undergraduate students at  Imperial College at the end of a six-week introductory programming course.

Dr. Tony Field has been publishing them for quite a few years now (the linked page goes back to 2009). I still to some extent remember my first year Haskell courses, somewhat impressed by the rationale for choosing Haskell even though my understanding at the time was rather clouded. I do remember a specific instance where Tony polled the class on past programming experience, noting hands for C++ and Java (I raised my hand for both), and then tossing in Haskell (a few people; probably because Imperial did say to prepare to study Haskell beforehand!). Besides this, I think  having to worry less about persistent state (or race conditions, though I don’t think we covered concurrency at Imperial until second year) and being closer to the notion of mathematical functions, which students should already have been exposed to, would also have helped.

This year’s test (2017) covered decision trees, culminating in a question inviting candidates to implement the information gain version of ID3 when building a decision tree from a dataset. It wasn’t too difficult as a whole, as Tony acknowledged on his page; despite probably last having written Haskell about a year ago when I attempted the 2016 test, I finished comfortably in 2 hours and 12 minutes (the time limit is 3 hours). I think this test as a whole required some fairly careful work, but didn’t demand anything in terms of special insight even at the very end (as some past tests have done). The elegance of my answers would probably leave a fair bit to be desired, though; I found I was building recursive traversals of the given datatypes very frequently.

That said, I found the first part somewhat more difficult than in the past. Typically Part I was extremely straightforward (rather awkwardly, there used to be a question asking students to implement lookUp almost every year); not so much this time. In particular, there was a rather fiddly function to write that involved navigating some data structures and building a frequency table; the spec featured a lot of type definitions that reminded me a bit of some experiments with Object Calisthenics (in particular, the “wrap primitives and strings in classes” rule). That said, completing Part I alone would already have let you pass (with a 47; the pass mark is 40). I think the frequency table was harder than anything in Part II, actually, which had a few, fairly straightforward tree functions to write.

Moving on, part III effectively introduced students to the Strategy pattern (in terms of an attribute selector for deciding which attribute to split the dataset on). Apparently, it was possible to solve partitionData with a suitable one-liner, though I unfortunately didn’t come up with anything along those lines, and went with a rather “direct” approach of dividing the rows by the element and removing the relevant attributes. Part IV introduced the concepts of entropy and information gain, and thus invited students to implement ID3; given the title of the paper I wasn’t too surprised to see this at the end.

I found it fairly interesting to revisit Haskell, and it was nice to see that I was still able to work with the language after not touching it for a year or so. While it might be fairly unlikely that I would work with functional languages in professional terms, concepts and reasoning that are more apparent in functional languages do certainly apply even when I work in Java or C++, whether in the obvious sense (streams/map/filter etc.) or less so (inductive/algebraic datatypes).

On the Practical Complexity of Efficient Testing

This is a part 2 to the previous post on the theoretical complexity of efficient testing. Recall that we modelled tests as being used to verify that code satisfied some requirements, and then modelled the problem of efficient verification as finding the smallest set of tests that covered all of the requirements.

Although the decision problem is NP-complete, we can still put forth a decent attempt at solving it. We can rewrite the set-covering problem as an integer linear programming problem (define an indicator variable indicating whether each test was included in the test set or not, and define a constraint for each requirement, indicating that at least one of the tests that satisfies it is true; we then need to minimise the sum of all of the indicator variables). There are many solvers such as GLPK or CBC that can solve even fairly large instances of these problems. Similarly, we can also reformula set cover as boolean satisfiability; there are many solvers that can handle large formulae with many variables as well.

That said, although we can minimise the number of tests being used, it’s not entirely certain that we should, for several reasons. For example, suppose we wanted to test a function that returns all instances of characters occurring exactly two times in a string. Well, this is one possible implementation – and I’d be fairly confident in saying that you can’t really do better than linear time (you can’t avoid parts of the string in general, though there are some cases where you can shortcircuit e.g. if you have examined a portion of the string where every allowable character has appeared at least three times).

public static Set<Character> findPairs(String input) {
  Preconditions.checkNotNull(input, "findPairs called on null input");
  Map<Character, Integer> charCount = Maps.newHashMap();
  for (char ch : input.toCharArray()) {
    charCount.putIfAbsent(ch, 0);
    charCount.put(ch, charCount.get(ch) + 1);
  }
  return charCount.entrySet()
                  .stream()
                  .filter(entry -> entry.getValue() == 2)
                  .map(Map.Entry::getKey)
                  .collect(Collectors.toSet());
}

The first problem would obviously be whether the number of tests is even a good metric. I’ve written a few tests for the method above:

@Test
public void returnsCharacterAppearingTwice() {
  assertThat(findPairs("aa")).containsExactly("a");
}

@Test
public void doesNotReturnCharacterAppearingThrice() {
  assertThat(findPairs("aaa").isEmpty()).isTrue();
}

@Test
public void throwsOnNullString() {
  assertThatThrownBy(() -> findPairs(null))
          .isInstanceOf(NullPointerException.class);
}

@Test
public void canFindPairs() {
  assertThat(findPairs("aa")).containsExactly("a");
  assertThat(findPairs("aaa").isEmpty()).isTrue();
  assertThatThrownBy(() -> findPairs(null))
          .isInstanceOf(NullPointerException.class);
}

I’d certainly prefer having the three tests which each test something specific, as opposed to the single canFindPairs() test (in fact, if I came across the latter in a code review I would push back on it). The main problem here is that one way of reducing the number of tests is simply to merge existing tests or run large integration tests only, which is generally not great. Incidentally, this could lead to an extended heuristic, where we weight test methods by number of assertions.

But let’s suppose tests have disjoint assertions, and we don’t attempt to game the system in the way described above. The next issue is then how we define requirements. One possibility is to give methods well-defined postconditions and check that tests verify these, but this is unlikely to scale to large systems.

A common method, then, is to use code coverage as a proxy (this can be measured automatically via tracing of test calls). Line coverage, including adjusting for conditionals could be a good starting point. However, this isn’t really a good metric either – the three tests introduced above or the single canFindPairs() test actually achieve 100 percent coverage, by most definitions:

  • We have an input that violates the precondition, and two that pass it (line 2).
  • We do exercise the body of the for loop with the “aa” and “aaa” tests (lines 5-6).
  • We have both true and false outputs in the filter construction (line 10). This might not even be considered to be a requirement for the line to be covered.

Yet, if someone submitted the above tests only for findPairs() and I did a code review, I would ask them to add more testing. I’d probably expect at least the following:

- empty string ("")
- one char ("a")
- inputs returning multiple chars ("aabbbccdde")
- nonconsecutive pairs ("abab")

Furthermore, the above method is not actually correct if going beyond UTF-16, so if (but only if) that would be likely given the context of the application involved I would ask for a test featuring that as well.

Thus, by optimising for code coverage and eliminating tests based on that, we run the risk of weakening our tests to the point where they couldn’t catch legitimate faults. For example, a test using characters outside of UTF-16 as described above would be unlikely to improve coverage at all, and thus might be pruned (thus allowing our implementation to pass even though it wouldn’t work). Approaches for evaluating whether this is worthwhile can include having developers plant faults in code, seeing if test suites after pruning can still catch them, or automatically mutating implementations (e.g. interchanging operations, changing the order of lines of code etc.) and seeing if test suites behave differently before and after pruning.

Coverage is still probably one of the least worst metrics in my opinion – I can’t really think of a good way of improving on it cheaply and scalably. Furthermore, studies have shown that in spite of line coverage being a kind of blunt instrument, it is nonetheless able to in several practical cases achieve decent reductions in test suite sizes without harming fault detection too much; nonetheless, the most aggressive solutions (such as using integer linear programming) seem to overfit to some extent, performing more than commensurately less well at detecting faults that were introduced.

  • 1
  • 2