This project proved to be an interesting endeavor in the important process of choosing a meaningful question which its feasible to gain a better understanding of with the data at hand. In this case, we began by trying to predict Super Bowl winners. Between the relative low instances of Super Bowl games and the imbalance of teams represented (two out of thirty-two teams in the league or twelve teams in the playoffs), this quickly proved to be a question for which this particular collection of data was not well suited.
We then turned our attention to predicting wins and losses of individual games within the playoffs. This provided the benefits of substantially more instances for training our models, but due to teams appearing multiple times in various rounds of the playoffs within the same year, data leakage became inevitable as the model learned to classify game outcomes based on known outcomes of other games rather than based on the team’s seasonal performance.
Finally, we made a series of data transformations which allowed us to create a dataset which featured each team from each year of the playoffs exactly once, with their playoff performances as binary outputs for predicting various levels of the competition. With this in place, we were able to begin answering questions related to team’s likelihood of progressing to various games, with a specific focus on the NFC and AFC conference games. This game provided the sweet spot of exclusivity (four teams out of twelve) without an imbalance of our two classes which would negatively impact the model’s performance.
As alluded to above, this project required a great deal of experimentation in terms of transforming the data into a final shape which would prove beneficial for classification without allowing for any unintentional data leakage into the model’s performance on unseen data. Further, with the playoff game data in the form of one game (and thus two teams) per each row and the seasonal data consisting of each team (per year) as a row, the process included splicing, transforming, and concatenating the game results so merging the playoff and seasonal data would work as intended. As is the case with many data mining projects, this process, while relatively straightforward in the final product, consumed a strong majority of the time poured into this endeavor and informed many of the processes which followed.
With what was expected to be a difficult problem to correctly classify - if it were easy, NFL analysts and the sports betting industry wouldn’t be as successful as they are - we decided to test a handful of models to see how each performed relative to one another.
In the first round of model selection, we chose to test a single Decision Tree and a more robust Random Forest. We also compared both of these models with and without the use of Principal Component Analysis to determine how feature reduction may harm or benefit the model. As seen in the slides above, these two models performed similarly to one another, with high performance predicting those who do not proceed to conference round, but struggling to correctly those teams who would ultimately be conference contenders. Even so, both models performed better with PCA components rather than the raw data. This may be attributed to mathematically simplifying the linear combinations of the seasonal data, or one might consider how the combination of offensive and defensive performances into fewer features may create attributes which explain more of the data’s volatility.
At this stage in the project, we also realized we needed to reevaluate our methods for evaluating the performance of a model. While accuracy is a good top-level metric, given the unique application of our project (namely, how valuable is a list of teams we predict to be conference contenders), this metric can become misleading when the model has high accuracy, but low precision. Ultimately, our goal is to provide a list of teams worth betting on to make it to conference round and that list have as many more hits than misses as possible. Thus, we shifted our focus to primarily evaluating models based on precision, such that the other metrics were not overly negatively affected by the tuning parameters changed.
With the baseline of performance provided by these two models, we tuned and tested another pair of models, Support Vector Machines and Neural Nets. Perhaps surprisingly, the best performance in terms of precision came from a simple Neural Net with one hidden layer consisting of only one node. This particular test provided a list of conference contenders on which more than 60% of the teams indeed ended up playing in the penultimate playoff game. While this success rate may not be particularly impressive, it does appear to contend with the average sports analyst’s performance over multiple seasons.
Perhaps more so than any other project to date, this particular challenge provided a number of invaluable lessons and learning opportunities. First, as discussed in the onset of this project, defining the question and determining the data’s capabilities to accurately and realistically provide meaningful insights towards answering the question requires a great deal of contextual knowledge, statistical rigor, and thoughtful exploration of the problem space and dataset itself. Equally crucial is understanding and defining the key parameters which will be used for evaluation and the trade-offs of highlighting one particular evaluation metric over others.
When considering modeling and optimization parameters, it is clear these steps are crucial and necessary, but also can only help to the extent that the dataset itself is separable and predictable for the model to function as intended. Further, complexity is not always superior, as simplistic models can often generalize better than more sophisticated techniques. Even so, there can be valuable trade-offs when black-box models such as Neural Nets offset the difficult to interpret structure with raw processing power and improved evaluation metrics. Thus, it is crucial to understand the unique parameters of each model for ensuring optimal performance and troubleshooting when necessary.
Working in an industry as data-rich as sports, there are a number of future considerations and improvements to be made to this project, which will hopefully be explored at another time. These include a myriad of further data transformations, one example of which would include normalizing seasonal data across each year to standardize the yearly metrics relative to the era of the game, given how football’s meta has shifted from rushing to passing over the last thirty years. In addition, new metrics could be created with league average metrics so as to provide comparable value to the seasonal data between playoff teams and those who did not qualify for the post-season. Other new features would include third-party data like Vegas betting lines, favorites, as well as other industry evaluations of teams like power rankings provided by content creators.
Ultimately, the project highlighted the unique features and predictive power of each model deployed, showed the unsurprising difficulty in predicting the outcomes of sporting events, and provided an opportunity to explore a data mining project from manually gathering data to cleaning and processing the data to model training and optimization and finally evaluating the results.