Reverse Engineering Bracketology

Simplifying Selection Sunday

Mar 10, 2023

I spent more time worrying about predicting all 68 into the field last year than I did graduating college. My bracket was updated daily, I watched over the NET like a hawk, and I conducted personal blind resumes to make sure I didn’t have any bias. Selection Sunday came and the entirety of my work came down to an hour-long show. I was a mediocre bracketologist. Through any measure- I was in the middle of the Bracket Matrix, scoring 350.

Taking SMU over Notre Dame was still the right choice

Ideology

Bracketology is one of the more objective exercises. We can disagree with Duke being on a 2-line and Tennessee being on the 3-line, but we are trying to predict what the ten people sitting in a room will think, not what is more accurate. We could have the 68 teams seeded automatically by use of a computer, but this would take away from the whole excitement and uncertainty of improving your resume when you’re on the bubble. Despite this, bracketology is a little bit like a formula, we understand the tools used, and we have access to the same numbers that the people who fill out a bracket have. The bracketology process could easily be simplified by making use of these tools, and creating a formula, which replicates the likely thinking of the Selection Committee.

Pulling the Data

Above all else, bracketologists look toward performance against each quadrant. Quad win/loss is a binary means of evaluating teams and their seeding. Though we aren’t privy to the exact reasons for teams being seeded in specific locations, it’d be easy to point towards a bubble team being left out due to losing a close game, rather than their location being three spots too low on a computed metric.

Thus, in line with the thinking of the Selection Committee, I based my initial model directly on quadrant performance. This was done by using BartTorvik.com and the website’s ability to sort teams based on their performance against each quadrant.

The results for performance against each quadrant were copied into a sheet, and the sheets were separated based on each quadrant's performance.

Though the NET was created in 2018, Bart has data going back to 2008 with quadrant performance, likely by mirroring RPI in that timeframe. 2021 and 2019 were used as validation sets and 2018-2008 as training data. The seeds for each team were regressed against their wins and losses against Q1A, Q1B, Q2, Q3, and Q4.

The Initial Model

Where this model lacks in complexity, it makes up for it in the quantity of data.

A strong correlation was shown between seed and quadrant wins/losses. The most impactful of the variables was Q1A wins. There was very little penalty awarded for a Q1A loss, as losing to an elite team should not harm your resume. It would take 13 Q1A losses to be awarded a single seedline knock, compared to just two Q3 losses.

Winning a Q2 game should not be twice as impactful as losing a Q4 game, as the model would claim. Single-Bid Conference Automatic Qualifiers, who are most likely to compile Q4 wins and losses, aren’t jockeying for higher seeds, they’re just playing to qualify, thus the lack of volume of Q4 losses lessens the usefulness of the model outright.

Precision is key when it comes to bracketology, and relying on not a very precise model would make this process difficult.

Quadrant 4 losses are not going to improve anyone’s resume. Zero positives can come out of playing a Q4 game, there is only the risk of losing them. Based on this analysis and the prior analysis with Q1A losses, a reduced model can be be viewed without these variables.

Reducing the model revealed losses to have less of an effect on the seed and wins to have a higher coefficient and a subsequent greater correlation. Next, performance is viewed against the validation set.

The reduced model performed slightly worse and was on average a tenth of a seedline less accurate than the full model. To give a bit of context, last year with my manual bracketology, I was 0.4 off on average, compared to 1.12 with this model.

Matching Methodologies

In understanding the selection process and how to teach our model to improve it, we can look towards the other metrics, specifically the computer-generated ones. There are resume-based statistics and predictive-based statistics. There is the NET, on which the quadrants are based. BPI, Sagarin, and KenPomery are all predictive-based metrics, while KPI and Strength of Record are based on the team’s resume.

There are similarities when it comes to metrics that fall under the same umbrella. A large quantity of quad data was pulled, and despite the wide pool of data, the model was not very precise. With the NET being half of a decade old, and one of the tournaments in that span being canceled, this is on the other end of the spectrum, pulling back fewer data points, but multiple variables. Quadrant record, KenPom, KPI, Net, SOS, and Average NET win/loss for 2018-2019, and 2020-2023 comprise the next dataset.

Four separate models were created. The first model has ten quadrant variables, both the rankings and the values, the SOS, and the Average NET win and loss. The second model is rescued, without SOS or Average Win/Loss. Having both the ranking and the value for metrics was redundant, so the value was omitted, while the ranking was kept in the third model. In the final, smallest model, the top 12 seeds were viewed, instead of expanding the field to the auto qualifiers in single-bid leagues.

The years 2018-2019 and 2020-2021 were used as the training set, with it being validated by 2021-2022.

Our average seedline between the four model’s differences ranked between 0.76 and 0.97. This was far improved from the previous model. The two best models, the one with the initial metrics but not the average NET win/loss and the reduced model of the top 12 seeds were shown to be the most precise. Though this is still not close to the value of 0.4 which was earned through manual bracketology, it is much improved, despite a quarter of the data to the previous model.

A tree model was built for the sake of visualization. The model we are looking to create is too precise for a tree model to be applicable, but it is interesting nonetheless that only metrics seem to be valued, and not quad victories.

Checking Model’s Relevance

We have two very similarly performing models, and seeing how applicable they look in this year’s bracket. Finding the average of these two models and putting a bracket together might provide us with a bit of relevance. Bracketology is not evergreen, so these results may not be relevant within an hour of this article being published, but there are some takeaways.

Our model has Texas on the 1-line and Houston on the 2-line, which isn’t likely to occur. The lack of quality wins is not outweighed by Houston’s impressive metrics. Gonzaga is on the 3-line above Arizona, reflecting how Arizona has been punished for its poor losses.

Most noteworthy is how Mountain West teams are seeded:

San Diego State - 4 seed
Utah State - 6 seed
Boise State - 7 seed
Nevada - 9 seed
New Mexico - 11 seed

The top three seeds all have a KPI in the top 20, with Nevada having a KPI in the top 30s. The top 4 seeds are all top 40 in the NET as well. Despite not very many high-level wins, these teams are rewarded more than any other team, compared to others’ projections.

Utah State and Saint Mary’s are similar cases, with high metrics despite not very many impressive wins, with Saint Mary’s being around 10 in most metrics. This lands them on the 4-line, despite most having them as a 5 or 6 seed.

The effect of the metrics needed to be tempered. This was most effectively seen in the differences between conferences. Despite similar metrics, low and high majors are evaluated differently, in part due to the opportunity for high-level wins. Thus, a binary variable was added of low/high major for specific teams. I marked all of the power six conferences and Gonzaga/Houston with a 0 and the other conference’s teams with a 1.

I pulled the top 12 teams, so as not to view any of the single-bid conferences which would artificially inflate the data. From this, we can see that being in a low major conference leads to a difference of one in seedline.

This model was not very much more precise than the previous.

Notes

The top three seeds remain the same outside of Gonzaga and Arizona flip-flopping
San Diego State, Boise State, San Diego State, and New Mexico remain the same, despite the attempted penalty, Saint Mary’s dropped from a 4 to a 5
Drake moved out of the at-large picture (the model doesn’t recognize AQs) following the added penalty.

With a final attempt to correct, BPI was added, a metric that does not value mid-majors as highly as other metrics. For context, Utah State is top 25 in most metrics but is barely top 50 in the BPI.

This new metric did not improve the preciseness.

Concluding Thoughts

With the tools and the methods in hand, the value of 0.7 seems to be as optimal as a linear bracketology could achieve. Although there is potential for a more advanced, nonlinear, multivariable model, the penultimate goal is a simplification of the selection process.

There are many drawbacks even with a score that isn’t comparable to manual bracketology, beginning with the fact that this model doesn’t select the teams which are in the field, it only seeds those that are in.

For a reference point, my optimal model has the following seedlines below, this being as of the Wednesday before Selection Sunday.

Though this is comparable to a fair other bracketologist’s bubble, North Texas and New Mexico both being comfortably in goes back to the issue of overseeding mid-majors.

Having an unsupervised model through linear variables was the challenge, but there are some manual adjustments that would need to be made before submitting a final bracket, specifically trimming down mid-major’s seedlines. The solution to this issue has not been found in the tools used by Bracketologists. Some supervision is necessary, at least through the models developed.

Though the process of bracketology can be reverse-engineered, the human condition, a key part of the Selection Committee cannot be tracked through linear methodologies. This model has not matched the complexities of human thought processes.

Resources Utilized

https://barttorvik.com/# - Pulling Quadrant Performance Data

https://kenpom.com/ - Team sheet rankings

https://faktorsports.com/ - Team sheet rankings (KPI)

https://www.espn.com/mens-college-basketball/bpi - Team sheet rankings

https://www.warrennolan.com/basketball/2023/net - Team sheet rankings (NET/SOS)

College Basketball Solution Visualizations

Reverse Engineering Bracketology

Simplifying Selection Sunday