Browse Month

November 2019

Detecting Progress in Sequences

I often try to evaluate whether something that is difficult to measure directly and clearly has improved. For example, I might like to evaluate how my German or logic puzzle skills have changed over time. I could try German past exams or look at logic contest results – however, one problem with these is that there is a lot of noise. For example, for the logic contest results, a score could be artificially low because I had a bad day or the contest was unexpectedly hard; it could also be high because I made multiple lucky guesses on difficult puzzles. Thus, a single measurement is unlikely to be sufficiently reliable.

One solution is then to use one’s average score, or take other statistical summaries of multiple data points. However, we probably don’t want to consider all of the data points we have as equally important. For example, I ranked 155th in a Sudoku GP contest in late 2017 – if I’m trying to evaluate my skill level at the end of 2019, that’s probably not relevant.

We could pick a cut-off point (for example, the beginning of 2019, or the last ten contests) and then discard all of the data from before that, and then simply treat the remaining data as equally important. This is often the basis of sliding window algorithms; if we say that we’re interested in one’s average score from the last ten contests, we can find this metric over time by considering a part of the list ending at today. There are methods for calculating these metrics efficiently (taking time linear in the length of the data stream).

Unfortunately, choosing a suitable window can be difficult – small windows can be vulnerable to noise, while large ones may fail to account for trends present within an individual window. As far as I know, this selection is more of an art than a science.

We can use more complicated approaches as well. Instead of picking a hard cut-off, where data from before the cut-off is irrelevant, we can instead treat data points as becoming less relevant over time. A method that’s often used is exponential weighting; giving the most recent observation a weight of 0 < \alpha < 1, the second most recent a weight of \alpha (1 - \alpha), the third \alpha (1 - \alpha)^2 and so on. As \alpha approaches 0, we approach a simple historical average; as \alpha approaches 1, we approach remembering just the most recent element. I’m not sure if the underlying assumption that events become exponentially less relevant over time is appropriate.

In spite of possibly sounding complex, this method does have computationally favourable properties. If we’re keeping track of a stream of data, we don’t actually need more than constant additional memory. It’s enough to keep just the previous reported average, because incorporating a fresh data point D into our metric can be done by S_{new} = \alpha D + (1 - \alpha) S_{old}.

There are some dangers here as well. The first challenge is bootstrapping; how should one pick the initial value of S? One could use the first observation, or perhaps an average of the first few observations if short-term divergence from reality is unacceptable.

I think there’s also a risk with massive outliers massively skewing the average (e.g. an API call which usually takes nanoseconds exceptionally taking an hour because of a system outage). This exists with any statistical technique, but if \alpha is small, our estimate will be “corrupted” by the exceptional data even after quite a few additional measurements. With the sliding window method, once the window has expired, the exceptional data point drops out.

In general, the methods we’ve covered assign weighting functions to the data points – the simple average just assigns the same weight to everything, the sliding window assigns the same weight to everything in the window and 0 to things outside the window, while the exponentially weighted moving average (EWMA) weights each point differently based on how recent it is.

As an extension, there are techniques for maintaining constant-size reservoirs of values that can be used to approximate more general summaries like order statistics, standard deviations or skewness. These often rely on holding a subset of the values being observed in memory. The selection mechanism for which values should be kept can be written to bias towards more recent measurements. In some ways, the calculation of our standard sliding-window based moving average can be implemented as a special case of this, where new entries are always included, and the oldest entry at each insertion is evicted. That said, we would probably not do this for just an average, as we can do that with constant memory (just remember the current average).

It’s not a particularly scientific or deterministic method, but in practice I find it useful to consider graphs with various transforms on top of them and draw conclusions based on that. I don’t have the sufficient statistical background or intuition to decide beforehand what would work well, unfortunately.

Lessons from a Purple Dragon: Exploring Game Search Spaces

I think I first played the first Spyro the Dragon game when I was about seven years old. I’m not sure what attracted me to the game. It might have been the idea of collecting bright, shiny gems, the rather gentle learning curve (unlike the Crash Bandicoot, Contra or Metal Slug games that I remember having access to on the PS1 at that time) or perhaps the exploration required to complete a 100% clear of the game.

More recently, a collection of all three games in remastered form was released on PC. I never actually got round to playing Spyro 3, for some reason, and the first two games were pretty enjoyable, so I purchased it. It was available on Steam for £34.99, though I searched elsewhere and managed to procure it for just £13.10.

In Spyro 1, the titular purple dragon has flame breath, is able to charge at enemies with his horns, and can jump and glide around (though he cannot fly). Levels in Spyro 1 generally feature a mostly linear path from a level’s starting point to its exit, with enemies and obstacles along the path; figuring out how to traverse this path can be interesting, in that one usually needs to understand how enemy attacks work, and/or how Spyro can navigate relevant obstacles, typically through judicious platforming.

However, reaching each level’s exit is rarely the end goal. Typically, levels contain a number of gems (and sometimes dragon eggs as well), and collecting these requires exploration beyond the main route. Although I certainly wasn’t familiar with the abstract concept at that time, when I first played these levels as a seven-year-old I would usually perform a depth-first search. I would head down an interesting path until it reached a dead end or looped back to somewhere I’d already explored, and then continue to a new path that hadn’t been searched yet.

Depth first search is naturally a reasonable way to explore a game world. Since players can’t teleport freely, breadth first search is generally not an option. Iterative deepening-style approaches could make sense if paths are uni-directional or if there are frequently very long routes that don’t need to be explored on the way from one area of interest to another, as it is usually possible to exit and re-enter levels from the beginning. However, this isn’t generally true in Spyro level designs. Thus, for Spyro I don’t think this core process has changed very much. What keeps Spyro interesting for me tends to not be about overcoming the obstacles on a given path (with some exceptions), but instead about identifying legitimate paths that can be plausibly explored. A simple example is in one of the areas in World 1, Town Square; there is an optional area accessed by gliding down a staircase and around a wall. This eluded me for a long time when I was seven, as clearing most of the levels (in the sense of reaching the exit, not 100% completion) can be done with gliding in mostly straight lines.

Typically, the game will give you hints that such hidden areas exist. Most obviously, each level has a number of gems to collect, and this number is known to the player – so shortfalls typically indicate that there are still hidden areas to be found (provided the player has meticulously collected everything in the areas they have already explored). It is also fairly common for some gems to be visible from one of the main paths, even if it is not made obvious how to reach them. With the Town Square example, one of the dragons in the area does actually give you a big hint on what to do, though I wasn’t aware of this as I skipped cutscenes then.

I usually had to resort to walkthroughs or online guides to find these answers in the past. I’m pretty confident it was quite different from the way I’d approach these problems now. I think I mostly just tried a ton of different approaches and saw what worked. However, I now use deductive reasoning more heavily – perhaps experience with participating in logic puzzle contests has helped. Instead of simply exploring paths that come to light and trying things that look encouraging, I tend to apply a perhaps more principled approach to identifying suitable routes to explore once the obvious methods have been tried. I would consider the usual platforming techniques that may be appropriate and test them against the level setup (e.g. Is there a high point I could glide from? Could there be some non-obvious platforms that are actually safe to land on? If there is/are supercharge ramps that increase Spyro’s running and jumping speed, can I use these? Can I combine them?). I’ve been able to solve quite a number of levels that I wasn’t able to in the past.

Some of this reminds me of a book I read earlier this year, Problem Solving 101: A Simple Book For Smart People by Ken Watanabe. The techniques introduced there for making decisions on how to approach problems are applicable to clearing a Spyro level – for various root causes/problems what one should do varies, and it is probably more expedient to think through what needs to be considered in a principled way than scurry off aggressively pursuing whatever solution comes to mind. I don’t think I’ve ever been faced with a level sufficiently difficult to need drawing a full-blown logic tree out, but even mapping out such a tree in my mind helps a lot.

Anatidaephobia: Ducks, Ponds and Probability

We discussed another interesting question at work, this time over Slack. This one seemed more mathematical than programming-based, though.

Four small ducks are in a large circular pond. They can be at any point in the circle, with equal probability. What is the probability that a diameter can be drawn so that all four ducks are in the same semicircle in the pond?

Naturally, there is a straightforward generalisation:

N small ducks are in a large circular pond. They can be at any point in the circle, with equal probability. What is the probability that we can fence off some sector of the pond which subtends an angle P, so that all four ducks are enclosed in the fenced area?

If I had to do this as part of an engineering problem, my first reaction would be to implement a Monte Carlo simulation. This would quickly reveal an answer of \frac{1}{2} for the first part, but in the second part things might become less obvious.

Usually for this kind of problem I tend to start by varying some of the parameters and trying to solve a simpler version of the problem. With one duck, drawing a suitable diameter is trivial; with two, drawing a diameter through one of the ducks is sufficient (since the second duck is on one side or the other – ‘small’ here means that we don’t consider the case where a duck lies exactly on the diameter). Going up to three, things get a little complicated; depending on the position of the first two ducks, the third duck can either be placed anywhere (if the first two ducks are at the same location, for example) or perhaps might be quite restricted (if the ducks are on almost opposite sides of the pond).

I then looked at other possible ways of simplifying the problem. For example, we don’t really care about where the ducks are relative to the centre of the pond. The relative angles between the ducks and the centre of the pond suffice to identify whether drawing the diameter is possible, and how far they are from the centre of the pond on a given axis won’t affect this. We can thus consider the ducks as uniform randomly occupying points on a looping one-dimensional continuum, such as the interval [0, 1).

Returning to three ducks, we can try to formalise the intuition that the range of positions allowed for the third duck varies depending on the position of the first two ducks. Define the span of n ducks S_n to be the total space on the continuum that the ducks occupy. For base cases, we define S_1 = 0, and S_2 is just the smaller distance between the first and second duck, so it has to be uniformly distributed between 0 and 0.5.

If we fix the value of S_2 = x, we can attempt to find the range of allowable third-duck positionings such that S_3 \leq n. Without loss of generality, suppose the two ducks are sitting at points 0 and x. Then, the lowest possible point the third duck can sit at would be x - n, and the highest possible point n (assuming of course n \geq x; if this is not the case then there are clearly no possible positions). The range of this interval is n - (x - n) = 2n - x, and the probability that a duck lands in that range is 2n - x. This of course makes the assumption that 2n - x is less than 1; if it is more than 1, then the probability would be 1; however, in our specific case since x > 0 and n = \frac{1}{2} we don’t have to worry about this.

This then reduces to a problem about conditional probabilities. We want to weight each possible value of S_2 based on how likely it is; the relative likelihood of each value is given by the probability density function. For S_2, we have

\displaystyle f_{S_2}(x) = \left \{ \begin{array}{lr} 2 & 0 \leq x \leq 0.5 \\ 0 & \text{otherwise} \end{array} \right.

Then, weighting based on this density function, we have:

\displaystyle P(S_3 \leq n) = \int_{x} P(S_3 \leq n | S_2 = x) f_{S_2}(x) \text{d}x = \int_{0}^{n} (2n-x) f_{S_2}(x) \text{d}x

If n \leq 0.5 then that will be equal to

\displaystyle \int_{0}^{n} (2n-x) (2) \text{d}x = \left[ 4nx - x^2 \right]_{0}^{n} = 3n^2

Thus we can find a diameter for \dfrac{3}{4} = 0.75 of cases with three ducks. If n > 0.5, then we need to make several refinements – most obviously, the upper bound of the integral stops at 0.5, as there’s no span with two ducks larger than that. Furthermore, there may be values of x where 2n - x > 1 in which case we need to clamp it down to 1. For simplicity, we focus on the case where n \leq 0.5 which is sufficient for our purposes.

Moving on, to find P(S_4 \leq n), we can similarly formulate

\displaystyle P(S_4 \leq n) = \int_{x} P(S_4 \leq n | S_3 = x) f_{S_3}(x) \text{d}x

f_{S_3}(x) may not seem immediately obvious, but we can rely on the CDF of S_3 which was calculated earlier – the probability density function is simply its derivative. Thus f_{S_3}(x) = 6x, and the conditional probability can be handled with the same logic that we used to go from S_2 to S_3, bearing in mind that the middle duck(s) aren’t important. Hence

\displaystyle P(S_4 \leq n) = \int_{0}^{n} (2n-x) (6x) \text{d}x = 6 \int_{0}^{n} 2nx - x^2 \text{d}x = 6 \left[ nx^2 - \frac{x^3}{3} \right]_{0}^{n} = 4n^3

and we have the answer to the original question, by substituting n = \frac{1}{2} we obtain an answer of \frac{1}{2}.

Interestingly, the CDFs of the various S_k seems to take on the form kn^{k-1}. We can prove this result by induction on k. This clearly holds for k \leq 4, based on our work so far – though note that k \leq 2 is enough for our proof. So we now take some arbitrary integer k, and assume that P(S_k \leq n) = kn^{k-1}. We need to show that P(S_{k+1} \leq n) = (k+1)n^k. The way we can do this is very similar to what we did for the concrete steps.

\displaystyle P(S_{k+1} \leq n) = \int_{x} P(S_{k+1} \leq n | S_k = x) f_{S_k}(x) \text{d}x

Since we’re assuming that P(S_k \leq n) = kn^{k-1}, f_{S_k}(x) = k(k-1)x^{k-2}. We can simplify out the conditional probability term, based on a similar argument on the conditional probability as before. Thus

\displaystyle P(S_{k+1} \leq n) = \int_{0}^{n} (2n-x) k(k-1)x^{k-2} \text{d}x = k(k-1) \int_{0}^{n} (2n-x) x^{k-2} \text{d}x

What follows is a bit of careful rewriting:

\displaystyle P(S_{k+1} \leq n) = k(k-1) \int_{0}^{n} 2n x^{k-2}- x^{k-1} \text{d}x = k(k-1) \left[ \frac{2n x^{k-1}}{k-1} - \frac{x^k}{k} \right]_{0}^{n}

And we can simplify this to:

\displaystyle P(S_{k+1} \leq n) = \left[ 2nk x^{k-1} - (k-1)x^k \right]_{0}^{n} = 2kn^k - (k-1)n^k = (k+1)n^k

which was what we expected. This also allows us to draw more specific conclusions – for example, in general for n = \frac{1}{2} the probability is just simply \frac{k}{2^{k-1}} for k ducks.