The chance of you winning the Powerball jackpot – one in 300 billion. The chance of you filling out a perfect March Madness bracket is even slimmer than that. The challenge then becomes developing mathematical models to help improve these dismal odds as much as possible.
March madness draws quite the crowd as it is estimated that 47 million Americans filled out a bracket this year and spent an estimated $8.5 billion betting on the outcome of the championship. Everyone is trying to flex their college basketball knowledge while making a few bucks and gaining the bragging rights of your friends betting pool. For the mathematically inclined, accurately predicating March Madness brackets is a technical problem in search of a solution.
In the past few years there have been machine learning tools and publicly available datasets that have added a technological twist to March Madness. Statisticians and data scientists are now competing to create the most accurate machine learning model for bracket prediction. The twist in this case is that knowing too much about basketball might hurt your odds.
The odds are absolutely stacked against you. You can let your dog choose the winning team or throw darts or even go with the best-looking uniforms but it won’t help your chances with the expanded growth of the tournament. In 1939, only eight teams competed in the NCAA tournament, which gave you the odds of filling out a perfect bracket around one in 128. The tournaments expanded in 1951 to 16 teams which lowered your odds to one in 32,768 but that’s still pretty good compared to today. Filling out a perfect 64 team bracket gives you the odds of around one in 9.2 quintillion.
Machine Learning Madness invites participants to leverage machine learning techniques to create their tournament brackets. The contest is hosted on Kaggle, a Google-owned platform that is a cross between stack exchange and Github specifically designed for data scientist. This year there are 955 competitors that are competing for the $25,000 prize. Before the tournament begins there is a massive data dump where participants are given access to basic information like the scores for every division 1 basketball game dating back to 1984. Regardless of the technique used the participants must predict the outcome of each of the 2,000 possible NCAA games. Not only do they have to predict the winner and loser but also declare how certain they are of this outcome on a scale from zero to one.
Machine learning is full of promises but could easily be overhyped. The outcome of the NCAA championship will determine whether it helped create a more accurate official ranking, but if Machine Learning Madness has proved anything, it’s that the future of college basketball is as much about building networks as cutting down the nets.