The birthday paradox is a fairly well-known but unintuitive observation that although there are normally 365 days in a year (for simplicity, ignoring leap years and assuming a uniform distribution for birthdays), with as few as 23 people it is more likely than not that two of them will share a birthday. One of my teammates proposed an inverse of this problem: what is the minimum number of people are needed so that it is more likely than not that it is someone’s birthday every day?
Contents
Modelling A Specific Instance
There’s a lot to take in here, so let’s try to solve a simpler problem first. We could attempt to solve: given a specific number of people, how would we calculate the probability that it is someone’s birthday every day? If we have this method, we can reconstruct the answer to the original problem via a search algorithm.
The more general problem here would be: suppose we have people. Each person draws one of
events with replacement – these are equally probable and independent. What is the probability that all events occur at least once? The more general introductory problem has
, but of course that’s tricky to work with right off the bat, so let’s examine a couple of the smaller cases.
Algorithmic Ideation
Base Case Analysis
If , since we said that we always have at least one person, the one possible event will happen and the probability is
. Moving on to
, here
is a special case with probability of
. For larger
, there are two specific situations we want to avoid, where every person draws the same event (remember that there were two events). Each of these happens with probability
so the probability of everything occurring once would be
.
We can now analyze – if
we definitively have probability
. For
, the first person will receive a novel outcome with certainty, but the second and third person also need novel outcomes, and this happens with probability
. It gets a little trickier with
:
- The first person to draw an event will get a novel event. This leaves 3 people to draw events, and 2 out of 3 events undrawn. We can call this state
.
- The second person to draw an event will get a novel event with probability 2/3, putting us in the state
. Alternatively, he/she will draw the same event as the first person with probability 1/3, getting us in the state
.
- If we’re in
, there’s a 1/3 probability that the third person draws the last event (in which case we are done), and a 2/3 probability he doesn’t, putting us in
.
- If we’re in
we have one person who needs to draw the last event with probability 1/3.
- If we’re in
, the third person will get a novel event with probability 2/3, putting us in
. He might draw the same event as the first two people, in which case we definitely failed.
More generally, we can model the problem using a state machine (a technique which I discussed two and a half years ago). The states of this machine can be written as 3-tuples which means we have
people to draw events, there are
events of which
still need to be drawn. Notice that
, and that for our problem, we have
and we generally want to find the probability of success for the state
.
For we can draw this as follows.
Recurrence Relations
We can define as the probability that transitions from the state
will eventually end up in a SUCCESS state, defined as
for all values of
and
(as that means all events were drawn). We make some observations on simple cases:
. We are already in a success state by definition.
. If everyone has drawn an event and there are still events that haven’t occurred, there is no chance for us to draw those events.
That’s all we need, and we can look at the recursive case. In the state , there is a
probability that we draw a new outcome, putting us in the state
. There is also a
probability that that doesn’t happen – in that case we’re in the state
. It thus follows that
Notice that these states are well-formed, since we handled the and
base cases earlier. It is possible to optimise this by adding some more sophisticated base cases, but this still gives us a clear enough method for working out the probability. For example,
Solving The Specific Instance
The answer to our original problem, for a given number of people is then
. Working this out by hand is likely to be painful, so we should avoid that. It’s easy to code something simple up that is based on the recurrence relation above:
private static double findSuccessProbability(long remainingPeople, long remainingEvents, long totalEvents) { if (remainingEvents == 0) { return 1.0; } if (remainingPeople == 0) { return 0.0; } double probabilityOfNewEvent = ((double) remainingEvents) / totalEvents; double successProbabilityOnNewEvent = findSuccessProbability(remainingPeople - 1, remainingEvents - 1, totalEvents); double successProbabilityOnOldEvent = findSuccessProbability(remainingPeople - 1, remainingEvents, totalEvents); return probabilityOfNewEvent * successProbabilityOnNewEvent + (1 - probabilityOfNewEvent) * successProbabilityOnOldEvent; }
We can test this and verify that indeed . However, if we try to call this with, say
the program doesn’t seem to give the answer quickly.
One solution here is to use memoisation. The naive version written above ends up computing the answer for the same state potentially exponentially many times, but we can cache the results of previous computations and use them. This could be done with a map with the 3-tuple as a cache key; alternatively, a bottom-up dynamic programming solution can be written, since there’s a clear order in which we process the subproblems. In any case, these methods should get the runtime down from exponential to .
Reconstructing The General Solution
We want to find the smallest such that
. We could start from
and then keep trying larger values of
: since
is constant this is actually not that bad as we can reuse the results of computations from previous attempts when trying new values of
.
public static void main(String[] args) { long numPeople = 365; double successProbability = findSuccessProbability(numPeople, 365, 365); while (successProbability <= 0.5) { numPeople++; successProbability = findSuccessProbability(numPeople, 365, 365); } System.out.println(numPeople); }
This gives us our answer: 2287. We have while
.
This approach looks a bit primitive – and normally we’d use a modified binary search where we first search for an upper bound via something that grows exponentially, and then binary search up to our bound. This works, since increases as
increases for constant
.
However, for this specific problem and more generally for recurrences where we end up reducing parameters by 1, we still need to burn through almost all of the smaller subproblems we skip over by isolating the bound, so it won’t actually give us that much here.
Usually when working with recurrences it is nice to find a closed-form solution for our expression (i.e. we have a general expression for that doesn’t itself reference
), though I didn’t really see how to make progress with this one.