Browse Category


Another Look at Dynamic Programming

Whilst on the tube today, I overheard a mother teaching her child how to count, using a method likely to be extremely familiar to many – fingers. The child counted correctly from one to ten, and then the mother added her hands too and asked the child to count how many fingers there were now.

“One, two, three -“

And so on, till twenty. The mother then attempted to explain that it would have been faster if the child continued from ten, rather than starting again. Although it wasn’t really an example of the concept, the words dynamic programming immediately shot to the front of my mind. I initially found this to be a rather confusing concept to grasp (let’s say that up to high school programming contests, if a problem wasn’t solvable by exhaustive search or greedy algorithms I’d likely have struggled), so I figured a post on it might be worthwhile.

(This isn’t really an example of DP; I’d say it’s closer to divide and conquer plus the use of a cache. We’ve cached the answer that the child has ten fingers, and identified the problem as being taking the sum of the child’s and parent’s fingers. Note that because of the possibility of amputation or polydactyly, the subproblems are not the same – and, specifically, saying 2 * 10 = 20 isn’t generally correct.)

Essentially, the key idea behind dynamic programming (DP) is that we save time by not re-doing work that we’ve already done, by remembering the results to intermediate steps. Of course, this tends to mean that there’s a space overhead. This is generally useful in cases where a problem is too large to solve, yet it can be decomposed into smaller pieces, and importantly we can combine optimal solutions to these smaller pieces, to get a solution that is optimal for the original problem. (More formally, this is known as optimal substructure.)

Furthermore, we want to get some significant benefit out of actually remembering the answers (in practice, we want to use our previous work multiple times; this manifests in the form of overlapping subproblems). This is what would distinguish an approach as being a DP-based one, as opposed to divide and conquer.

Of course, the fingers example is trivial. There are many other natural examples (the ones that come to mind first for me include knapsack problems and route-planning), though I’m not sure I directly apply DP that much in a natural context (although quite a few days have tasklists that could be done solving an ordered constrained TSP, the last time I used the Held-Karp algorithm was probably for my third year project). It certainly does see many applications that are relevant to daily life (error correction in search queries / autocorrect via Levenshtein distance; not sure how they are actually implemented but routing applications like Citymapper and Google Maps are likely to involve such algorithms as well).

In terms of implementation, the cache-based “top-down” solution was what I learned first, and to me at least was intuitively easier to understand. When you encounter a subproblem, you check a lookup table to see if you’ve done the problem before; if you have, you just take the answer from that. If you haven’t, solve the problem the hard way (this may involve more subproblems – when solving these, it’s important to look at the table again), and then (important!) store the answer you obtained back in the table.

The alternative “bottom-up” method involves generating solutions to smaller subproblems, and using these to build up the solution to a bigger problem. I’d probably first actually used a method along these lines when introduced to the Fibonacci sequence (probably in year 4 or so) – I remember being asked to compute F_{13} and did something like “1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, uh… 144, 233”. (This is linear time. It’s possible to do better via matrix exponentiation, or Binet’s formula – bonus points if you pair the exponentiation with a fancy multiplication algorithm like Karatsuba or even Schonhage-Strassen.)

From a computing point of view there can be both benefits and disadvantages to this versus the “top-down” method.

  • Ease of understanding and/or code readability are likely to depend on the problem; for Fibonacci I would prefer bottom-up, but I usually find the top-down case to be more approachable (it’s more intuitive to me at least to think “here’s how I decompose this problem” as opposed to “here’s how I build a solution from smaller solutions”).
  • The top-down approach might be able to solve some problems without necessarily computing all smaller subproblems that a bottom-up solution counting up from 0 or 1 might deal with. You can, of course, implement this in a bottom-up solution… provided you know how to compute the required subproblems in a way that isn’t itself too costly. With a top-down approach you get this “avoidance” for free.
  • As an extension of the previous point: for bottom-up you’ll need to figure out a safe ordering to go through the subproblems (you can’t have a solution depending on something that hasn’t been computed yet). This is easy in most cases (*cough* Fibonacci), but can be extremely difficult in others (chess transposition tables come to mind; problems with online input, many base cases and a massive domain).
  • Recursive implementations (which tend to be top-down, though could plausibly be in either direction; it’s possible to maintain your own call stack on the heap, or pass some kind of lookup table around) incur the overhead of function calls, and can cause stack overflows for large problems.
  • Handling limited memory (there are many 2D array problems for which only the last row of results needs to be kept; alternatively with Fibonacci we only need the last two results) tends to be more naturally expressed with the bottom up method (though of course, you can clean the top-down cache). This is probably because you’ll have defined an order for solving the subproblems, which may not be as immediately clear with the top-down method.

Note that although this is a powerful tool, there are quite a number of cases where you don’t actually need to consider all of the ways of decomposing a problem into subproblems. A well-known example would be the activity selection problem; given a set of mutually exclusive activities with start and end times, find the largest set of activities I can participate in. I can solve this optimally by sorting events by their ending time, and aggressively picking events to fill my schedule where feasible. The key differentiator here is what’s known as the greedy choice property; that making an optimal choice at each step gives us the overall optimal solution.

In practice anyway it’s highly unlikely that I’d weight my activities equally, so we then get to the weighted activity selection problem, and the greedy method no longer works (but we can still use dynamic programming – as before, sort the activities by their ending time E, and for each activity, pick the better of not attending it, or attending it and behaving optimally before the start time of said activity).

Sets of Sets

If I’m asked to think of a real-life example of a set of sets, I think the first example I would come up with was about job lots of items being sold. Interestingly, it actually took a while – the ideas I came up with before that, which were companies I have investments in (note: mutual funds and ETFs own sets of companies), or messages to my friends were really unions of multiple sets.

How would we model a set of sets in Java? Well, that’s not too hard, since a Set is a type after all:

Set<Set<Item>> items = Sets.newHashSet();
Set<Item> thingsBeingSold = Sets.newHashSet();
thingsBeingSold.add(Item.create("1 oz. Gold", 123456));
thingsBeingSold.add(Item.create("1 oz. Silver", 3456));

That said, we’ve got to be careful. If we continue the above with this:

Set<Set<Item>> items = Sets.newHashSet();
Set<Item> thingsBeingSold = Sets.newHashSet();
thingsBeingSold.add(Item.create("1 oz. Gold", 123456));
thingsBeingSold.add(Item.create("1 oz. Silver", 3456));
thingsBeingSold.add(Item.create("a chopstick", 20));
items.contains(thingsBeingSold); -> x.equals(items)); -> x == items);

What should the output of lines 7, 8 and 9 be? Are they equivalent, even? Well, there are some possible arguments in each direction:

  • They should return true, because the reference to the set thingsBeingSold is still there.
  • They should return false, because the value of thingsBeingSold isn’t the same as the value inserted.

I would argue that the first point is stronger, and would have been my answer when I first learnt about Java. However, the actual answer is that lines 8 and 9 should return true while line 7 generally returns false, though there are a few special cases:

  • It will return true if thingsBeingSold.hashCode() before line 6 has the same hash as thingsBeingSold.hashCode() after line 6 i.e. there is a hash collision, or
  • if the Item created in line 6 equals either of the Items created in lines 3 and 4 (unlikely).

The operation of line 8 is relatively simple: we check each element in the set to see if it is equal, and assuming our set’s equals() method does what it’s supposed to do we will always find the match. We can potentially improve on this with parallelStream(). Line 9 does a reference-equality check on the elements instead and would probably be faster as we don’t have to actually inspect the set contents. In this case, it’s also going to be true. However, these are linear time operations, at least. The point of using hash-based structures here, among other things, would be to knock the time complexity of lookups down to amortised constant.

In a sense, we find the object only if we fortuitously (!) happen to be looking for the same hash code. Java HashSets effectively are a key-set based view of a HashMap, and when we look up objects we actually investigate them by their hashes first, only bothering to check the values if we have a match on the hashes. This allows for some fairly interesting optimisations, actually; I previously thought that HashMaps use buckets and store an array of linked lists of mappings + information, resizing appropriately when the ratio of elements to buckets exceeded a load factor. This was correct up to Java 7, but in Java 8 it appears there’s a fun new optimisation that builds red-black trees sorted by hash codes (while objects themselves might not be Comparable, their hashes are ints, which certainly are) if some buckets get too large relative to others in an attempt to cut the worst-case complexity from linear to logarithmic. Still, when we inspect nodes, we look at their hashes first.

Note that this isn’t just a problem with HashSets; you can get into similar debacles with TreeSets, because the underlying red-black trees aren’t going to reorder themselves. In general, the Javadoc for Set does warn you that mutable objects shouldn’t be changed in ways that would affect their equals() method after they’ve been put into a set. HashSet and TreeSet are allowed to behave the way they do because of subsequent invariants between equals() and hashCode(), and comparison-based sets are supposed to use comparators that are consistent with equals() i.e. a.equals(b) iff a.compareTo(b) == 0.

One way to get round this is of course to ensure that the types used in these sets are immutable, or at least the parts that are used in equals() and related methods are immutable. If that’s infeasible, a good solution could be to delete the set that is to be changed from the set, and add it back after changes; add is idempotent and so the set will (correctly) stay as a set. Wiring this up seems rather messy, though; it looks like we need to listen for state changes (so some kind of publish/subscribe mechanism), where the objects being stored publish both their old and new state so that we can modify the set accordingly.

  • 1
  • 2