March is here. It’s finally warm enough in Chicago to go outside without feeling raw from the bitter cold, and people are celebrating like it’s much warmer than 35 degrees. It’s not. But this time of year also means the return of the the NCAA Tournament. March Madness is definitely my favorite sporting event where I get to root for random schools I have never heard of every year on Thursday afternoons.
Everyone is looking for an edge to help their bracket win their pool. Some people pick based on gut feel, some have watched some games, and others pick based on mascots. Some intrepid gamblers find good sources of data and use that to their advantage. Websites like Five Thirty Eight, and KenPom provide good publicly available data on win probabilities. But an issue with using this data is that a lot of other people are also using it. Even if you pick the best value from these sites, you are likely not differentiating your bracket much from others.
This led me to do some analysis. I wanted to know where the values in this year’s NCAA Tournament are. What teams should I pick if I want to win my bracket pool? Are there teams that may provide a lower expected score for me, but a higher probability of winning a pool? What is the ideal final four to win a bracket pool? And are there individual games early on in the tournament that are key to winning a pool?
To do this I combined ESPN’s team popularity data with 538’s team value data. I ran a Monte Carlo simulation where I simulated a result and 25 brackets for each trial to create a simulated pool. To simulate the result, I found the probability of each team winning each round, given the other team they were playing, and used a random number generator in Google Sheets. To find the probability that Team A would beat Team B in a round, I divided the chance Team A would win in that round by the chance Team would win plus the chance Team B would win. If the random number was greater than the probability of Team A winning the round, then Team B won. Otherwise Team A won.
I repeated this method but used team popularity data instead of team value data to simulate bracket picks. Then I ranked those brackets by how they did with the common result. After running all my trials. I calculated the expected value of picking each team per round in the tournament. Teams that were good value picks had higher average ranks, while poor value teams had lower average ranks.
I calculated the expected value by finding all the brackets that picked a given team to win a given game, and calculated their chance of winning their pool. I did this by dividing the number of pool winners that had picked that team by the number of teams who had picked that game. I only care about first place in my model because in many pools, only the winner takes home money. In other pools it’s slightly different, but it’s always important to finish at or near the very top. It’s like Rick Bobby said in Talladega Nights, “If you ain’t first, you’re last.” The link to my spreadsheet is here.
As an anecdotal example, everyone loves Duke this year. Zion Williamson is back for the tournament, and they are the overwhelming favorite to win it all. Analysts love them, and statisticians love them too. Bovada, Five Thirty Eight, and Ken Pomeroy all rank them as the best team in the country. According to ESPN, 41% of the country has picked Duke to win the NCAA Tournament. But the NCAA Tournament is inherently fickle, and it is difficult for anyone to win the whole thing. Often, the best team loses early. As a result, the odds of Duke winning the tournament are much less than 41%. Five Thirty Eight, for example, has their title chance around 19%.
This means that there is not a lot of value to picking Duke to win the championship this year. Even if they do win, several other people in your pool probably picked Duke too. This means that you may not win your pool. And if they do not win, which has a high likelihood of happening, you will definitely not win your pool. The pool winner almost always pick the right champion, since the championship game is valued so highly. I found that the value of picking Duke to win the championship (the chance you would win your bracket pool if you picked Duke to win the championship) was only 3.2%.
Most brackets that won their pool in my simulation did pick Duke. But Duke provides relatively low value this year as a potential champion, because so many people are picking them. It is a good idea to pick them to go far, but you should pick another team to win. The best values for the title are Gonzaga (9.6%), Virginia (6.9%), and Michigan State (7.2%). Tennessee (2.3%), Duke (3.2%), North Carolina (3.6%), and Kentucky (3.6%) are relatively poor values. This makes sense to me because all three are popular teams with large fan bases. On the other hand, Virginia lost in the first round last year, and Gonzaga is a small school in a mid-major conference. It makes sense why people would doubt them. Michigan State is also flying under the radar a bit this year, though they did win the Big Ten.
I have the values for the earlier rounds calculated as well, but the values are much closer together. It does not seem to really matter who you pick early on, since the game values are so low. You only get 1 point for picking a first round game and 2 for a second round game on ESPN, which is peanuts compared to the 32 you get for the championship game. You just don’t want to pick a really low seed with no chance of winning, since they will take prevent you from getting points in later rounds. Once you get to the Final Four, there is some separation. In the picks. The best value Final Four is Virginia (5.0%), Gonzaga (5.5%), Michigan State (6.4%), and Kentucky (5.7%). Duke (3.7%) and North Carolina (3.6%) provide decent value. The worst value Final Four picks among high seeds are Tennessee (2.8%) and Michigan (2.9%). You may see some different numbers than these because I am having the spreadsheet execute every minute to try and generate more trials. Hopefully I can gain some insights from more trials that I can apply next year.
There do seem to be some issues with my model. The way I simulate games and picks both tend to favor the higher-rated team, perhaps a bit more than they should. I ended up having more Duke national championships, both picked and actual, than Five Thirty Eight projected. This is likely due to an error with my model. I would like to refine it more next year. Also, my computer could not handle running too many cases in Google Sheets. And so I could not run as many trials as I would have liked. I was only able to run 287 trials at the time of publication, which could lead to some inaccuracies. This is preventing me from digging too far into cases involving teams other than the 1 and 2 seeds, because of a lack of people picking them. There is some weird data further down the pick values as a result. All in all, it was a good first run and I am happy with the results.