Readability and Performance

I find myself faced with tradeoffs between readability and performance fairly often. This happens both in terms of programming (where, apart from in some specific situations, the former usually isn’t too bad) and in English.

In programming, code that is readable is usually desirable. This often makes it easier to maintain, especially for other developers who subsequently need to understand it. For example, let’s suppose we have a Java hash map, and we need to return an Optional indicating the value with a key that matches a given Predicate, if it exists. However, if there are multiple keys matching the Predicate, we want to throw an exception (even if the values are the same).

I would probably write something like this:

private static <K, V> Optional<V> getMatchingValue(Map<K, V> data,
                                                   Predicate<K> keyMatchPredicate) {
  List<V> matchingValues = data.stream()
                               .filter(keyMatchPredicate)
                               .limit(2L) // 2 because we need to catch multiples
                               .collect(Collectors.toList());
  if (matchingValues.isEmpty()) {
    return Optional.absent();
  }
  if (matchingValues.size() == 1) {
    return Optional.of(matchingValues.get(0));
  }
  throw new IllegalArgumentException("Map had multiple matching keys.");
}

However, in terms of performance we could clearly improve. Using limit does allow us to shortcircuit (so we don’t look at more elements than we need to). However, creating stream objects could still add significant overhead, especially if the predicate is simple.

The code below should accomplish the same function, but more quickly.

private static <K, V> Optional<V> getMatchingValue(Map<K, V> data,
                                                   Predicate<K> keyMatchPredicate) {
  Optional<V> value = Optional.empty();
  for (Map.Entry<K, V> entry : data.entrySet()) {
    if (keyMatchPredicate.test(entry.getKey()) {
      if (value.isPresent()) {
        throw new IllegalArgumentException("Map had multiple matching keys.");
      }
      value = Optional.of(entry.getValue());
    }
  }
  return value;
}

This is still fairly readable, in my opinion, though there is a bit more state to keep track of. However, if the predicate is costly, we could do even better by taking our first approach and parallelising it, essentially framing the problem as a map-reduction. (The end result is likely to be slightly less readable. It might be substantially less so for people unfamiliar with concurrency).

Of course, it is not always the case that readability and performance are negatively correlated. Optimising compilers implement many techniques that transform a program with relatively readable source code into one that is efficient for a machine. This is often true, though, when an entirely different (and usually more complex) algorithm and/or implementation is selected because of performance.

It goes without saying that readability is also subjective (tabs versus spaces is a starting point). Readability does not always imply maintainability as well. For example, deeply nested structures where each layer focuses on one specific functionality may be readable, but may prove less maintainable given that they may produce longer, more convoluted stack-traces when exceptions occur.

Defining “performance” in natural languages is tricky. We could consider performance to be efficiency of communication, assuming a perfect ability to comprehend. Well-written compilers implement the source language specification faithfully. They should thus be able to understand valid source, even if it’s not in a human readable form. This happens in practice; processes like JavaScript minification are commonplace.

Of course, English does not quite have a specification. Nonetheless, I think efficiency under perfect comprehension is probably a reasonably close analogue to software performance. For example, if I was asked to explain the logical argument rule of modus tollens, I would probably say something like:

Suppose we have if A, then B. Suppose not B. Then, we have not A (since if we had A, then we would have B).

That explanation would probably make sense to readers vaguely familiar with formal logic and/or philosophy. Again, I could conceivably further increase “efficiency”, by removing the bracketed part. I could even boil things down further:

If A, then B. Not B. Thus not A.

Of course, if the person I was talking to was unfamiliar with logic and, let’s say, a friend, I would proceed very differently. This would, in a sense, be prioritising readability:

It’s a way of making an argument in logic. The idea is that if we know that if A is true, then B is true, and we know B is false, then we can say that A must also be false. Let’s use a concrete example. Let’s say that if it rains, I’ll wear a hoodie, and I’m not wearing a hoodie. Deducing that it must not be raining would be an example of modus tollens.

That said, there is also plenty of subjectivity in terms of readability for natural languages. There have been many attempts to compute estimates or measures of readability. These often use formulae that accept word length and syllable distributions and/or lookup tables. However, many of these fail to account for, among other things, the level of abstraction in text. Going back to our three explanations of modus tollens, they have approximate Flesch-Kincaid grade levels of 0.4, -2.6 and 5.5 respectively. However, I’m fairly confident that most three or four year olds (grade -2.6) would not quite understand the concept of modus tollens after reading “if A then B, not B, thus not A”. Even if 11 to 12 year olds (grade 5.5) were split into groups and given the three explanations, I suspect that the third would still be the most likely to help the students understand the concept.

I actually find the second explanation the most readable, owing to domain knowledge. In just nine words, it gives me a precise and accurate description of what the logical rule is. It might be appropriate for a general audience. It probably would not be for an audience familiar with neither logic nor philosophy.

The first is good in that it explains why not A is reasonable. I could see myself appreciating it if the subject matter was somewhat more complex (e.g. results in modal and temporal logic).

The third features a concrete example, which is useful when learning. It also explicitly sets out some assumptions common in logic, such as “suppose X” meaning “suppose X is true“. However, I find it to be less readable, as I need to read a longer text to extract the main conclusion of the paragraph.

Earlier, I highlighted that readability and maintainability of software and source code are not one and the same. Let’s consider the notion of maintainability. If spoken (especially extemporaneously), I’m not sure this is necessary at all. If written, that could refer to the difficulty of proofreading or returning to the work when trying to revise it, perhaps?

In the software case, there are times when my hand has been forced and I’ve thus opted for a complicated, less readable implementation. Outside of word games like Taboo and exams or application processes with strict word limits, I don’t think I’ve had to for English. Thus, in general I would consider my target audience and aim to present it in what I expect would be the most efficient way possible, while still retaining the required readability.


Leave a Reply