Elements of Binary in the NCAA Basketball Tournament

If you’re a sports fan, you think of basketball when you see this:

Diagram of the NCAA Basketball Tournament
Basketball or Binary?

If you’re like me, you also think of powers of two, binary trees, logarithms, laws of exponents, geometric sequences, geometric series, and Bernoulli trials — in short, the elements of binary numbers, binary code, and binary logic.

I hope you’re thinking: “wow — all that math stuff I hated actually has a practical use!”


(As of 2011, the format of the tournament has changed. Four play-in games, known as the “First Four,” determine the last four entrants to the 64-team field. My description below applies to the six-round, 64-team part of the tournament, although the NCAA has renumbered the rounds — the old first round is now the second round, etc.)

Powers of Two

The NCAA Basketball Tournament, depicted in the diagram above, is a 64-team, single-elimination tournament. Why 64 teams? Because 64 is a power of two. Powers of two halve into powers of two, which gives the tournament symmetry: all 64 teams play in the first round, and teams that meet in rounds thereafter have won the same number of games.

Powers of two that pop up in the tournament are: the nonnegative powers of two 1, 2, 4, 8, 16, 32, 64, and 263, and the negative powers of two 1/2, 1/4, 1/8, 1/16, 1/32, 1/64, and 1/263.

Binary Trees

A binary tree is a model of things that have binary structure. The tournament has binary structure because its games are binary; each has one of two outcomes. You can see binary trees in the brackets. Although they are laid out as four separate trees, they are actually part of one big binary tree. This diagram connects them:

Bracketless Representation of the NCAA Basketball Tournament
Bracketless Representation of the NCAA Basketball Tournament.

You can see the nested structure in the tree. It is made up of subtrees, each of which is effectively a power of two sized sub-tournament. One 64-team tournament is composed of two 32-team sub-tournaments, each of which is composed of two 16-team sub-tournaments, etc. This symmetry, due to the initial power of two number of teams, makes the tree a perfect binary tree.

A computer scientist would draw the tree like this:

The NCAA Tournament Represented As a Binary Tree
The NCAA Tournament Represented As a Binary Tree (click for higher resolution).

Each circle, called a node, represents a team. The 64 nodes at the bottom, called leaf nodes, represent the 64 teams selected. The 63 nodes above that, called internal nodes, represent teams that have won one or more games. The 63 internal nodes correspond to the 63 games played in the tournament. Each level of internal nodes represents winners of one round. Winners advance up the tree until one team remains at the top node, called the root node.

The internal nodes are labeled by level, from round 1 to round 6. These represent the status of the tournament after the given round is complete. I’ve labeled the leaf nodes round 0, but they don’t really represent a round; the 64 teams get there by selection. Round 6 marks the end of the tournament — the lone team remaining has won all 6 of its games.

Logarithms

A logarithm is a fancy word for power, or exponent. You’re probably familiar with logarithms in base 10, but you can compute logarithms in any base. Logarithms in base 2 are useful when dealing with powers of two. The logarithm in base 2 of a power of two is the exponent corresponding to that power of two. For example, the logarithm in base 2 of 64, or log2(64), is 6, which means 64 is 26. In other words, it takes 6 rounds to halve 64 teams down to 1.

Laws of Exponents

You can use the laws of exponents to determine how many games are played in any round k:

\mbox{\footnotesize{\displaystyle{\frac{2^6}{2^k} = 2^{(6-k)}}}}

That is a compact way of expressing the halving of 64 teams k times.

Applying the formula, 2(6-3) = 23 = 8 games are played in round 3. How many games are played in round 6? That’s easy: 2(6-6) = 20 = 1.

Geometric Sequences

A geometric sequence, also known as a geometric progression, is a sequence of numbers that are related by a constant multiple, like sequences of powers of two. In sequences of powers of two, the multiple is either 2 or 1/2. For example, in the sequence 1, 2, 4, 8, 16, 32, 64, each number is double the previous; in the sequence 64, 32, 16, 8, 4, 2, 1, each number is half the previous.

Here are sequences of powers of two that arise in the tournament:

  • 64, 32, 16, 8, 4, 2. The number of teams that play, by round.
  • 32, 16, 8, 4, 2, 1. The number of games played, by round. Equivalently, it’s the number of winning teams per round, or the number of losing teams per round.
  • 1/2, 1/4, 1/8, 1/16, 1/32, 1/64. The fraction of teams that remain, by round.

Geometric Series

A series is a sum, and a geometric series is a sum of numbers from a geometric sequence. The total number of games played in the tournament is a geometric series: 32 + 16 + 8 + 4 + 2 + 1 = 63. Writing it in reverse order — 1 + 2 + 4 + 8 + 16 + 32 = 63 — makes it clearer that this notational shortcut applies:

\mbox{\footnotesize{\displaystyle{\sum_{i=0}^{5}2^i = 63}}}

The \mbox{\footnotesize{\sum}}} symbol means sum, and the way you read it is the sum from i = 0 to 5 of 2i is 63. In other words, the sum of the first 6 nonnegative powers of two is 63.

There is a nice formula that gives the sum without any adding:

\mbox{\footnotesize{\displaystyle{\sum_{i=0}^{n}2^i = 2^{n+1} $-$ 1}}}

It says the sum of the first n+1 nonnegative powers of two is one less than the n+2nd nonnegative power of two. For us, n is 5, so the sum is 26 – 1, or 63.

(There’s another — some say simpler — way to count the number of games. Since there are 64 teams, and all but one team loses exactly one game, there must be 63 games. It works, but I find the geometric series explanation more satisfying.)

The formula might be overkill for a basketball tournament, but it comes in handy in other contexts. What’s the largest value that can fit in a 32-bit binary number? The largest value is represented in binary as 32 1 bits, which represents the first 32 nonnegative powers of two. Applying the formula, we know the sum is 232 – 1, or 4,294,967,295.

Games Played Through Round k

There’s a more general formula to compute the number of games played through any round, not just the last. Doing it by hand, you’ll see that the total number of games played through each of the 6 rounds is 32, 48, 56, 60, 62, and 63, respectively. That’s 32, 32 + 16, 32 + 16 + 8, 32 + 16 + 8 + 4, 32 + 16 + 8 + 4 + 2, and 32 + 16 + 8 + 4 + 2 + 1. Except for the last one, those series don’t start at 1, so the summation formula above does not apply.

That gives me a good excuse to introduce series of negative powers of two. Look at the sum 32 + 16, for instance. That’s equal to 64 * (1/2 + 1/4). The sum 32 + 16 + 8 + 4 is equal to 64 * (1/2 + 1/4 + 1/8 + 1/16). Those sums all include a series of negative powers of two starting at 2-1, and conveniently, there’s a formula for them:

\mbox{\footnotesize{\displaystyle{\sum_{i=1}^{k} 2^{-i} = \frac{2^{k}-1}{2^{k}}}}}

(For those of you that care for such things, that represents the value of a binary fraction consisting of k 1s following the radix point — but I digress.)

To use that formula, just plug in your value for round k and then multiply by 64, the original number of teams. For example, the number of games played through round 3 is 64 * (23 – 1)/23, or 64 * (7/8), or 56.

Bernoulli Trials, Outcomes, and Probability

A Bernoulli trial is an event that has one of two possible random outcomes — flipping a coin is a good example. Each game can be considered a Bernoulli trial; either team A wins or team B wins. From that you can derive the number of different outcomes for the entire tournament. That is, the number of ways the brackets can be filled out: 9,223,372,036,854,775,808.

That’s nine quintillion, two hundred twenty-three quadrillion, three hundred seventy-two trillion, thirty-six billion, eight hundred fifty-four million, seven hundred seventy-five thousand, eight hundred and eight. Or, if you prefer, 263.

That’s a big number, but it’s arrived at fairly simply. One way is to notice that there are 63 games with two outcomes each. By the rules of counting, that’s 263. But there’s a longer explanation, one I find more intuitive, and it’s based on the number of outcomes per round.

There are 232 outcomes in round 1, since there are 32 games; there are 216 outcomes in round 2, since there are 16 games; etc. But those outcomes are based on only one selection of teams entering that round; you must account for all possible selections. You do that by multiplying. For example, for each of the 232 round 1 outcomes, there are 216 round 2 outcomes. This means there are 232 * 216 = 2(32+16) = 248 outcomes (by the laws of exponents) through round 2. Putting all 6 rounds together, you get:

232 * 216 * 28 * 24 * 22 * 21 = 2(32+16+8+4+2+1) = 263 .

(I resisted putting a summation symbol in the exponent 🙂 .)

The Office Pool

If you’re filling in brackets for your office pool, your odds of guessing all outcomes correctly, in theory, are 1 in 263. In reality, your odds are much better than that, because not all teams are evenly matched.

How about the odds of picking just the winner of the tournament? Those are about 18 orders of magnitude better, at 1 in 64. Again, that’s theoretical, since it assumes all teams are evenly matched.

What if the lowest numbered seeds were guaranteed to win all their games? Well, it wouldn’t be much of a tournament, because you might as well skip the first four rounds. The four number one seeds would face each other in the final four, and at that point you’d have a 1 in 4 chance of picking the winner, and a 1 in 8 chance of filling out the brackets correctly (4 round 5 outcomes multiplied by 2 round 6 outcomes).

Application to Other Tournaments

The analysis above applies directly to other 64-competitor tournaments, like the World Golf Championship match play tournament and the Wimbledon doubles tennis tournaments. The analysis also generalizes to other power of two sized tournaments, like the NCAA National Invitational Tournament (32 teams, 31 games, and 5 rounds) and the Wimbledon singles tennis tournaments (128 players, 127 matches, and 7 rounds).

(Caution: just because a tournament starts with a power of two number of competitors doesn’t mean it has to be laid out symmetrically. For example, in the 16-team Big East tournament, there are two rounds of byes before it becomes a symmetric, 8-team tournament.)

Elements of the above analysis apply to many other single-elimination tournaments, even when they don’t start with a power of two number of competitors. The Big Ten Conference Men’s Basketball Tournament, for example, starts with eleven teams; it becomes a symmetric 8-team, 7 game, 3 round tournament after the first round.

Elements of the above analysis also apply to other tournament formats, like the winner’s bracket in double-elimination tournaments, and the upper bracket in triple-elimination tournaments.

Tournaments in just about every sport have elements of binary.

Dingbat

7 comments

  1. I don’t think this is correct: “f you’re filling in brackets for your office pool, your odds of guessing all outcomes correctly, in theory, are 1 in 2^63.” If you were picking the winner before each round, that would be correct. However, most predictions are done before the entire tournament. So, in round 3, for example, there are essentially 8 teams that could win that game; it is no longer binary.

  2. @Tom,

    There are 263 possible ways to fill in a bracket. You will pick one of those ways before the tournament starts. This gives 1 in 263 odds of picking the correct one.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

(Cookies must be enabled to leave a comment...it reduces spam.)

Copyright © 2008-2024 Exploring Binary

Privacy policy

Powered by WordPress

css.php