Anatidaephobia: Ducks, Ponds and Probability

We discussed another interesting question at work, this time over Slack. This one seemed more mathematical than programming-based, though.

Four small ducks are in a large circular pond. They can be at any point in the circle, with equal probability. What is the probability that a diameter can be drawn so that all four ducks are in the same semicircle in the pond?

Naturally, there is a straightforward generalisation:

N small ducks are in a large circular pond. They can be at any point in the circle, with equal probability. What is the probability that we can fence off some sector of the pond which subtends an angle P, so that all four ducks are enclosed in the fenced area?

If I had to do this as part of an engineering problem, my first reaction would be to implement a Monte Carlo simulation. This would quickly reveal an answer of \frac{1}{2} for the first part, but in the second part things might become less obvious.

Usually for this kind of problem I tend to start by varying some of the parameters and trying to solve a simpler version of the problem. With one duck, drawing a suitable diameter is trivial; with two, drawing a diameter through one of the ducks is sufficient (since the second duck is on one side or the other – ‘small’ here means that we don’t consider the case where a duck lies exactly on the diameter). Going up to three, things get a little complicated; depending on the position of the first two ducks, the third duck can either be placed anywhere (if the first two ducks are at the same location, for example) or perhaps might be quite restricted (if the ducks are on almost opposite sides of the pond).

I then looked at other possible ways of simplifying the problem. For example, we don’t really care about where the ducks are relative to the centre of the pond. The relative angles between the ducks and the centre of the pond suffice to identify whether drawing the diameter is possible, and how far they are from the centre of the pond on a given axis won’t affect this. We can thus consider the ducks as uniform randomly occupying points on a looping one-dimensional continuum, such as the interval [0, 1).

Returning to three ducks, we can try to formalise the intuition that the range of positions allowed for the third duck varies depending on the position of the first two ducks. Define the span of n ducks S_n to be the total space on the continuum that the ducks occupy. For base cases, we define S_1 = 0, and S_2 is just the smaller distance between the first and second duck, so it has to be uniformly distributed between 0 and 0.5.

If we fix the value of S_2 = x, we can attempt to find the range of allowable third-duck positionings such that S_3 \leq n. Without loss of generality, suppose the two ducks are sitting at points 0 and x. Then, the lowest possible point the third duck can sit at would be x - n, and the highest possible point n (assuming of course n \geq x; if this is not the case then there are clearly no possible positions). The range of this interval is n - (x - n) = 2n - x, and the probability that a duck lands in that range is 2n - x. This of course makes the assumption that 2n - x is less than 1; if it is more than 1, then the probability would be 1; however, in our specific case since x > 0 and n = \frac{1}{2} we don’t have to worry about this.

This then reduces to a problem about conditional probabilities. We want to weight each possible value of S_2 based on how likely it is; the relative likelihood of each value is given by the probability density function. For S_2, we have

\displaystyle f_{S_2}(x) = \left \{ \begin{array}{lr} 2 & 0 \leq x \leq 0.5 \\ 0 & \text{otherwise} \end{array} \right.

Then, weighting based on this density function, we have:

\displaystyle P(S_3 \leq n) = \int_{x} P(S_3 \leq n | S_2 = x) f_{S_2}(x) \text{d}x = \int_{0}^{n} (2n-x) f_{S_2}(x) \text{d}x

If n \leq 0.5 then that will be equal to

\displaystyle \int_{0}^{n} (2n-x) (2) \text{d}x = \left[ 4nx - x^2 \right]_{0}^{n} = 3n^2

Thus we can find a diameter for \dfrac{3}{4} = 0.75 of cases with three ducks. If n > 0.5, then we need to make several refinements – most obviously, the upper bound of the integral stops at 0.5, as there’s no span with two ducks larger than that. Furthermore, there may be values of x where 2n - x > 1 in which case we need to clamp it down to 1. For simplicity, we focus on the case where n \leq 0.5 which is sufficient for our purposes.

Moving on, to find P(S_4 \leq n), we can similarly formulate

\displaystyle P(S_4 \leq n) = \int_{x} P(S_4 \leq n | S_3 = x) f_{S_3}(x) \text{d}x

f_{S_3}(x) may not seem immediately obvious, but we can rely on the CDF of S_3 which was calculated earlier – the probability density function is simply its derivative. Thus f_{S_3}(x) = 6x, and the conditional probability can be handled with the same logic that we used to go from S_2 to S_3, bearing in mind that the middle duck(s) aren’t important. Hence

\displaystyle P(S_4 \leq n) = \int_{0}^{n} (2n-x) (6x) \text{d}x = 6 \int_{0}^{n} 2nx - x^2 \text{d}x = 6 \left[ nx^2 - \frac{x^3}{3} \right]_{0}^{n} = 4n^3

and we have the answer to the original question, by substituting n = \frac{1}{2} we obtain an answer of \frac{1}{2}.

Interestingly, the CDFs of the various S_k seems to take on the form kn^{k-1}. We can prove this result by induction on k. This clearly holds for k \leq 4, based on our work so far – though note that k \leq 2 is enough for our proof. So we now take some arbitrary integer k, and assume that P(S_k \leq n) = kn^{k-1}. We need to show that P(S_{k+1} \leq n) = (k+1)n^k. The way we can do this is very similar to what we did for the concrete steps.

\displaystyle P(S_{k+1} \leq n) = \int_{x} P(S_{k+1} \leq n | S_k = x) f_{S_k}(x) \text{d}x

Since we’re assuming that P(S_k \leq n) = kn^{k-1}, f_{S_k}(x) = k(k-1)x^{k-2}. We can simplify out the conditional probability term, based on a similar argument on the conditional probability as before. Thus

\displaystyle P(S_{k+1} \leq n) = \int_{0}^{n} (2n-x) k(k-1)x^{k-2} \text{d}x = k(k-1) \int_{0}^{n} (2n-x) x^{k-2} \text{d}x

What follows is a bit of careful rewriting:

\displaystyle P(S_{k+1} \leq n) = k(k-1) \int_{0}^{n} 2n x^{k-2}- x^{k-1} \text{d}x = k(k-1) \left[ \frac{2n x^{k-1}}{k-1} - \frac{x^k}{k} \right]_{0}^{n}

And we can simplify this to:

\displaystyle P(S_{k+1} \leq n) = \left[ 2nk x^{k-1} - (k-1)x^k \right]_{0}^{n} = 2kn^k - (k-1)n^k = (k+1)n^k

which was what we expected. This also allows us to draw more specific conclusions – for example, in general for n = \frac{1}{2} the probability is just simply \frac{k}{2^{k-1}} for k ducks.


Leave a Reply