Predicting NBA Three-Point Shooting Part 2: Random Forests

A few weeks ago, I published a model projecting three-point shooting numbers in the NBA for this year’s draft class. I used a linear model based on college statistics, and it was pretty good, but it did have some problems, particularly around upperclassmen and players who did not take a lot of free throws (which is somewhat common among three-point specialists).

To improve my projections, I switched to using a tree-based model from a linear one. In a linear model, you create an equation like y = mx + b (think back to algebra). y is what you are predicting (in this case, NBA three-point percentage), and x is the value (or values-there can be more than one) you are using to predict; the x values (or predictor variables) in the first model were college numbers for three-point percentage, three-point rate (the percentage of field goal attempts that are threes), two-point jumpshot percentage, free-throw percentage, and position. Then, you pick m and b values that fit the data.

A tree-based model uses decision trees to create predictions. Decision trees are a lot like flow charts. Think of this classic one by Twitter god Shea Serrano, but using statistics instead. In a shooting decision tree, we would divide up data into groups using decisions on the chart, and then calculate the average three-point shooting percentage of all the shooters in each bucket. This average becomes the prediction for shooters whose college statistics place them in that group. Here’s a visual representation of a simple tree for shooting:

I used a random forest, which takes a ton of these trees and combines the results to get an accurate prediction, for this model. Random forests follow the wisdom of the crowd; the idea that the opinion of the masses is better than the opinion of one expert.

The results of the random forest model are more accurate than the linear model. Using career three-point percentage, career free-throw percentage, career two-point jump shot percentage, career three-point rate, and the individual season numbers for the player’s last season in college for all of the above statistics (these were the same as the career numbers for freshmen), I ended up with a .26 R-squared value (the percentage of the variance in the data that can be explained by the model) for the test set (compared to .15 before), with a .026 mean absolute error (previously .028) and a .035 root mean-squared error (previously .039). All data came from Bart Torvik’s site again.

I found that career college free-throw percentage is the most predictive statistic for NBA three-point percentage, with a relative feature importance of .19. The second-most predictive is free throw percentage in a player’s last college season (.17). This is consistent with most other models out there predicting NBA shooting success; after all, a free throw is an isolated indicator of shooting ability, with no defense or anything to complicate the shot.

This model did a pretty good job of predicting past players. Duncan Robinson had the best prediction in the set (40.3 percent) and ended up as the second-best shooter in the set (43.7 percent, above his prediction, but within his 95 percent confidence interval). Stephen Curry (40.1 percent predicted, 43.5 percent actual), Gary Trent Jr. (39.7 percent predicted, 40.5 percent actual), and Buddy Hield (39.3 percent predicted, 39.0 percent actual) all also did well in both predictions and actual shooting.

Like the first model, this model missed on Michael Porter Jr. due to his limited sample size of three games in college (35.6 percent predicted, 42.2 percent actual). It also missed pretty heavily on Khris Middleton, predicting him to only be a 33.5 percent shooter in the NBA based on poor college three-point shooting numbers (32.8 percent for his career, only 25 percent in his last season) that were not enough to overcome his 75 percent career free throw percentage.

Utah State senior Sam Merrill projects to shoot 38.1 percent from three in the NBA, the best value in this draft class. His 95 percent prediction interval is 32.2 percent-44.0 percent, meaning there is a 95 percent chance his career NBA three-point percentage falls in that range.

Tyrell Terry (37.8 percent), Aaron Nesmith (37.8 percent), Markus Howard (37.8 percent), and Nate Darling (37.6 percent) all also project well. So does the top shooter in the old model, Immanuel Quickley (37.5 percent).

Anthony Edwards projects decently (34.5 percent), with his strong 76.9 free throw percentage outweighing his poor 29.1 three-point percentage.

Interestingly, Devin Vassell (34.2 percent) and Obi Toppin (33.1 percent) do not project well, despite the fact that both carry some regard as outside shooters. In both cases, a mediocre free-throw percentage is the culprit (72.0 percent for Vassell, 70.6 percent for Toppin). Neither relied on sharpshooting to build a draft case, but these poor projections should give teams pause.

Here are the numbers for everyone in the class, with a 95 percent prediction interval:

Originally published at http://theplaygrounder.com on November 18, 2020.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store