Employing MaxDiff to Predict the 2024 March Madness Results

Cameron Halversen

Last updated: 23 Apr 2024

Introduction

Right before this year’s (2024) NCAA March Madness tournament, Sawtooth Software put out a survey to find out how well we could predict the winning teams and build the ultimate bracket. The survey used a technique called maximum difference scaling (or MaxDiff), a powerful research tool that calculates preference scores for anything from brands and product features to ad messages and basketball teams.

Now that March Madness has wrapped up, we wanted to take a look to see how our MaxDiff results compare to the actual games.

MaxDiff Overview: What Is MaxDiff and How Does it Work?

MaxDiff is a useful research tool for any time when we need to understand the relationship between a large set of items. For example, if we need to find which products are more or less interesting to customers, which features are more or less important to consumers, or which teams we think are more or less likely to win the championship.

In our case of the basketball teams, we could ask our survey respondents to rate every team’s likelihood of winning on a standard ratings scale. But with 68 teams competing, even on a 10-point scale we’ll end up with a lot of teams that have the exact same rating. So maybe we could have respondents rank all of the teams from #1 (most likely to win it all) to #68 (least likely to go all the way).

While a professional sportswriter might have the patience to rank every team one by one, I don’t know of many survey respondents who have that kind of time on their hands. Plus, ranking that many items at a time quickly becomes a difficult task, especially as you move lower down the list. Good luck figuring out the difference between the 33rd and 34th most likely winner.

MaxDiff is a more efficient way to get the sort of measurements that we’re looking for. In a MaxDiff survey, we take our long list of items (68 teams) and break it into smaller, carefully randomized subsets. Then all you have to do is choose the best and the worst option out of each of the subsets.

For example, in this question I can consider just these 5 teams at a time. A survey taker might say that of these 5, they thought that North Carolina was the most likely to win it all while Clemson was the least likely.

The magic efficiency of MaxDiff is that with just those two selections, we’ve already learned a ton about the relationships between all 5 teams. Now I know that this respondent believes that:

North Carolina beats Gonzaga
North Carolina beats Marquette
North Carolina beats Illinois
North Carolina beats Clemson
Gonzaga beats Clemson
Marquette beats Clemson
Illinois beats Clemson

In fact, of the 10 possible head-to-head matchups that we could see between these 5 teams, we’ve already made a prediction about 7 of them. And all it took was 2 clicks!

We’ll repeat the question with a new subset of teams and after a few more clicks we can learn about the relationship between all 68 teams in the rotation. We can analyze the answers from this and every survey respondent together by running a multinomial logistic regression (MNL) model, usually using a hierarchical Bayesian (HB) engine, which will distill all our collected answers into a set of “scores”. Higher scores for the teams that were chosen a “More Likely” to win and lower scores for the teams that were “Less Likely”.

Get Started with Your Own MaxDiff Study Today!

Want to try MaxDiff for yourself? Get access to our free MaxDiff research tool. In just a few minutes, you can easily create powerful surveys, including MaxDiff exercises, with our easy-to-use interface.

Start a MaxDiff Study for Free or Request a Product Tour

The 2024 March Madness MaxDiff Results

So, after all that, what did we find? Here are the results of our MaxDiff analysis, pooling the opinions of all our survey respondents together:

Note: These results have been re-scaled in such a way so that the scores for all teams sum to 100. This is a commonly used MaxDiff scale for reporting, although many others are available.

By putting the teams in order from highest MaxDiff score to lowest we can produce our own ranking from 1 (most likely to win) to 68 (least likely).

And how did our ranking do? If we used these results to fill out an entry for the official ESPN Tournament Challenge, that bracket would have scored in the top 84% of all 22.6 million entrants.

But wait, there’s more

MaxDiff results give us more than just the rankings of the teams. The actual “size” of the MaxDiff scores also show how strongly a team is favored relative to any of the other teams. In this way, MaxDiff can also quantify the magnitude of the preferences between items. Items with scores close together are closer in preference than items with scores far apart. We can apply our MNL calculated MaxDiff coefficients (the un-scaled original versions of our reported “scores”) to a logit choice rule to calculate the estimated likelihood of any 1 team winning in a match-up with any other team, according to the survey responses.

P[W1] = e^U1/( e^U1 + e^U2)

Where:

P[W1] = probability of Team 1 winning
U1 = coefficient for Team 1
U2 = coefficient for Team 2

Let’s take 3 teams as an example: Baylor, Washington State, and Illinois. In the actual tournament, none of these teams played against the other but our MaxDiff model lets us test the “what-if” matchups. As an average of all respondents who took the survey, Baylor received a calculated coefficient of 6.1, Washington State received a coefficient of 1.4, and Illinois got 5.4. In a projected matchup between Baylor and Washington State, Baylor has a e^6.1/(e^6.1+e^1.4) = 99% chance of winning which seems like a good bet. On the other hand, in a simulated game between Baylor and Illinois, our survey respondents think that Baylor only has a e^6.1/(e^6.1+e^5.4) = 67% chance of coming out on top, making it a closer call.

Now that all 67 games of the tournament have been played and the winner of every match-up is already known, we can compare our model’s predictions to the actual results. Let’s look at which games were our best bets, and which ended up being our worst upsets.

The top ten games that our MaxDiff model predicted the best, that is the games in which the actual winning team was the team with the highest predicted likelihood of winning ended up being:

Winning Team	Losing Team	Likelihood of Winner
North Carolina	Wagner	99.99%
Purdue	Grambling State	99.99%
Houston	Longwood	99.99%
UCONN	Stetson	99.99%
Tennessee	St. Peter's	99.99%
Iowa State	South Dakota State	99.99%
Arizona	Long Beach State	99.99%
Marquette	Western Kentucky	99.99%
Purdue	NC State	99.97%
Baylor	Colgate	99.96%

While the 10 worst predicted games, that is the games where our model really missed the mark and gave the actual winner the lowest predicted likelihood of winning, were:

Winning Team	Losing Team	Likelihood of Winner
Oakland	Kentucky	0.00%
Yale	Auburn	0.01%
NC State	Marquette	0.05%
NC State	Duke	0.09%
Duquesne	BYU	0.13%
James Madison	Wisconsin	0.21%
Clemson	Arizona	0.58%
Grand Canyon	St. Mary's	0.58%
Alabama	North Carolina	0.62%
Clemson	Baylor	1.47%

Which brings us to the winner of our little Sawtooth Software contest. Earlier we mentioned that we can estimate the MaxDiff model using a hierarchical Bayesian (HB) engine. One of the most useful features of HB model estimation is that not only will it calculate the MaxDiff scores according to the pooled responses of all survey takers, but it will also calculate separate sets of MaxDiff scores according to the individual responses of each individual survey taker. Using those individual scores, we can repeat the process to calculate the estimated likelihood given to the winner winning for every game for every respondent.

Looking across all 67 games, each respondent who took the survey produced a total likelihood for the entire bracket. In other words, the MaxDiff answers of each survey taker can calculate the probability that they would have given for this year’s championship tournament going exactly the way that it did. And the 1 survey taker who gave the highest likelihood for the games shaking out exactly like they did walked away with the Sawtooth Software prize.

In Conclusion

So is MaxDiff the best tool for predicting March Madness results? Honestly, probably not.

In the end, the Likelihood of Winning scores calculated for each team are just a way of quantifying the opinions of our survey takers. When we use MaxDiff to measure brand preference or purchase likelihood, those opinions are exactly what we’re after. But it turns out that your opinion about the most likely winner might not be the best predictor of the most likely winner. Which is part of what makes March Madness so much fun, the tournament results are famously hard to predict and no one has ever nailed a perfect bracket.

Maybe a MaxDiff model built from interviewing just the basketball experts would be a better predictor? Or maybe not. Our survey results showed that the self-identified “experts” did no better than anyone else at picking the winners.

Self-Proclaimed Expertise	Average Number of Games Called Correctly
I know nothing about college basketball	45.2
I tune in to a game from time to time	46.2
I follow a team or two pretty closely	46.7
I have a pretty broad knowledge of what's what in NCAA hoops	45.5
I intently follow and analyze the NCAA with a passion	45.3

MaxDiff does add some fun functionality to our insights though. The MaxDiff scores let us go beyond the actual bracket and expand our predictions to any of the over 2200 matchups that you can imagine between these teams, and all that it took was a 5-minute survey!

If you’re interested in exploring how you can use MaxDiff in your own research, go to https://sawtoothsoftware.com/discover to see for yourself.

MaxDiff Study Consulting

Want to run a MaxDiff but need help? Contact our expert consulting team for help with survey design, fielding, and interpreting survey results.

Contact Our Consulting Team