# On Set Collecting

When I hear the word ‘set’, the first thing I think of is the computing data structure, and then the mathematical construct that that data structure is trying to model. Essentially, a set is an unordered collection of distinct objects. I’d probably then think of the Egyptian deity before thinking of the card game, or failing at a contract in bridge. That said, we recently had some interesting discussions about the card game over lunch at work.

The game consists of a deck of 81 distinct cards which each show a number of shapes. Each card has four attributes, and each of these attributes can take on three values – number of shapes (one, two or three), colour (red, purple or green), shading (shaded, striped or unshaded) and shape (diamond, rounded rectangle, ‘S’).

Initially, twelve cards are dealt to the table; players compete in real-time to identify a set of three cards where, for each attribute, the cards are either all the same or all different. The player collects these cards, and the dealer replenishes the cards on the table.

It is possible that no set is possible with twelve cards. In physical games, the dealer will, after a while, deal three more cards to the table if no one can identify a Set. We were playing on an app, though; this app waits for players to claim that there is no set available, and penalises players if they make an erroneous claim. More often than not, even though all of us were looking carefully at the board, we were too quick to declare no-set with 12 cards. However, we had one round where even with 15 cards (after one declaration of no-set), no one found anything. After a long while, someone pressed the no-set button, and indeed there wasn’t a set!

After the game, discussion turned towards finding the maximum number of cards where no set was possible. We quickly settled on 16 as a lower bound; this is possible if you only use two of the values for each of the four attributes (e.g. by throwing out all cards that have three symbols, are red, are shaded or are diamonds). Now it becomes impossible to perform matches along any attributes being different, since you need one occurrence of each of the three values; also, because cards are unique there is no set of three cards where all four attributes are the same (that would be three copies of the same card). This arrangement has $2^4 = 16$ cards.

Going beyond this is tricky, mainly because the number of possible groups of cards you have to pick from is large. There are $81$ ways of selecting the first card, and $80$ ways of selecting the second. The third would have $78$ ways, since there is one specific card that won’t go with the first and second. However, later on you can’t just deduct one card for each already selected card; for a simple counterexample, suppose I have already selected:

• (1) one red shaded diamond
• (2) one green shaded rectangle
• (3) one green shaded S

Now suppose I try and add the card (4) with one red shaded rectangle. This means that the card with one purple shaded rectangle would now become ineligible for selection, because it would form a set with (2) and (4). However, we have already eliminated this card from consideration, because it would form a set anyway with (1) and (3).

I turned to writing a quick program here. This is a standard algorithm based on depth-first search.

• Each card is represented as a vector with four components.
• There is an initial set of 81 vectors created.
• To guide our search, whenever we consider adding a card to a working set, we iterate over the elements of the existing working set and determine which cards would, together with the new card, create a set. This relies on an observation that for any pair of (distinct) cards, there is one and only one card that would form a set with the first two. We then prune these set-creating cards from the search tree.
```private static void recursiveSearchForCapSet(Set<SetCard> possibleCards, Set<SetCard> workingSet) {
if (possibleCards.isEmpty()) {
if (workingSet.size() > best.size()) {
System.out.println("New best found: " + workingSet.size() + "; " + workingSet);
best = workingSet;
}
return;
}

for (SetCard candidate : possibleCards) {
Set<SetCard> elimination = Sets.newHashSet();
for (SetCard workingCard : workingSet) {
}
recursiveSearchForCapSet(Sets.difference(possibleCards, elimination), workingSet);
workingSet.remove(candidate);
}
}

private static SetCard computeLastElementOfSet(SetCard first, SetCard second) {
// XXX There's probably a cleaner way to compute 'all different'
long targetP1 = first.p1 == second.p1 ? first.p1 : 3 - (first.p1 + second.p1);
long targetP2 = first.p2 == second.p2 ? first.p2 : 3 - (first.p2 + second.p2);
long targetP3 = first.p3 == second.p3 ? first.p3 : 3 - (first.p3 + second.p3);
long targetP4 = first.p4 == second.p4 ? first.p4 : 3 - (first.p4 + second.p4);
return new SetCard(targetP1, targetP2, targetP3, targetP4);
}
```

This found sets of size 17 quite quickly, but knowing that the bound was 20, I knew that wasn’t good enough. There were several implementation tricks that I used to help speed up the search.

• This search is easily parallelisable; each branch of the search tree may be explored in parallel. A simple implementation (and what I used) could rely on Java’s parallel streams when doing the recursion, for instance. The orchestration around keeping track of the best answer needs a few small changes to be thread-safe, of course.
• If the goal is merely to find some set of size 20, then we can exploit the symmetry of the problem and fix the first card to be any card (I used 0,0,0,0), since any cap set could be re-expressed as one that contains 0,0,0,0 by permuting the assignment of attributes to variable values. There may be more clever symmetry to exploit here than what I did.
• Computing the last element of a set involves a bunch of jumps, branches and math; the result of this computation is deterministic, and can be cached so that it can be reused along different branches of the search tree. There’s probably something more clever I can do regarding overlapping sub-problems (this improves the runtime only by a constant factor).

This was sufficient to get the tool to spit out many sets of size 20, though I unfortunately didn’t churn through every possible selection, so this doesn’t show that 20 is an optimum.