Last April, I presented a statistical model (NHLP) which helps predict the future NHL potential for first year draft eligible CHL forwards. The factors used as predictors were points-per-game and age (though age was only found to be a predictor for power play scoring). To improve upon the predictive power of the model another factor researched was percentage contribution to team scoring. The results of the study found, that similar to points-per-game, there is a moderately strong positive correlation with percentage contribution to team scoring. In a nutshell, forwards who carry the load of scoring in juniors are more likely to have a higher points-per-game in their career NHL season. Therefore, when evaluating which forwards to draft, not only does a team want forwards who can put up points, but also those who can do it while being the main focus of a team’s offense.
For the 2015 Draft the NHLP formula used was:
(0.231+.404*Non-PP Pts/G) + (1.694+0.363*PP Pts/G-0.089*Age on Sep 15 of draft year)
Non-Power Play R-sq(adj) 0.336
Power Play R-sq(adj) 0.315
The sample used for the model has since been expanded to include forwards from the 2007 draft, forwards who passed the 250 games threshold during the 2014-15 season and forwards who had a career season in 2015 in terms of points-per-game. The result for the expanded sample of players is:
By using the updated sample, it improves the R-sq(adj) by over 7% for both non-power play and power play scoring. Now if we add the percentage contribution to team scoring into the model, does it improve the model? The results are:
There is a slight increase with R-sq(adj), in both the case of non-power play and power play scoring, however the R-sq(pred), which measures the predictive quality of the model, went down when percentage contribution to team scoring was added. While independently percentage contribution to team scoring is a factor, there is multicollinearity between points-per-game and percentage contribution meaning that these predictor variables are highly correlated. If there were no correlation between factors we would expect a variance inflation factor (VIF) of 1, while if highly correlated a VIF of over 5. In this case, it makes sense that percentage contribution to team scoring and points-per-game are correlated, as both measure points in different ways. One way to look at it is that a forward who puts up a lot of points will be involved in a large percentage of team scoring; a player who puts up a large percentage of team scoring, will put up a lot of points.
Therefore, we need a different approach to deal with the multicollinearity problem. One solution to dealing with this problem is to run partial least squares regression. When using this method the results are:
By using partial least squares regression, the predictive ability of the model slightly improves R-sq(pred) from .3453 to .3474 for non-power play scoring and from .3154 to .3212. While not a huge improvement over the previous NHLP model, it does provide the most predictive model yet for determining a first year draft eligible CHL forward’s future NHL potential. Therefore for the 2016 draft the NHLP formula will be:
(0.201645 + 0.220766 X Non-PP Pts/G + 0.563435 X Non-PP Contribution %) + (0.699759 + 0.179796 X PP Pts/G – 0.036095 X Age on Sep 15 of Draft Year + 0.250900 X PP Contribution %)
While the formula is helpful in analyzing the 2016 draft it is not of use if we do not apply it to forwards from previous seasons so that we have comparable players to base them against. In the coming months I will slowly continue to add to the list. Come back next week as the formula is applied to CHL forwards picked in the top 3 overall since the 1998 draft.