The ability to predict the success of hockey prospects at young ages has long been a goal in the business of hockey. Now more than ever, the success of young players is directly related to the success of an NHL team. For the most part, rosters are built through the draft, rather than trades and free agency. Knowing who will is likely to be successful and who is likely to fail can be the difference between winning and losing in the future – and that can subsequently be the difference between employment and unemployment for the person who is choosing the players.
It wasn’t that long ago (although it seems like ages) that Canucks Army had access to such a tool. PCS, the Prospect Cohort Success project developed by Money Puck and Josh Weissbock, used historical data to project players in the here and now. Unfortunately, we lost access to the system when those two were hired by the Florida Panthers.
As you may have noticed, we’ve been using comparable percentages to assess prospects again over the past couple of months, beginning with this article here. It’s been a bit of a mystery until now, but it’s time to pull back the curtain. Draft and prospect analytics are returning to Canucks Army and the Nation Network. This is not a rebirth of PCS, but instead an alternative, using similar underlying principles.
This is pGPS: the prospect Graduation Probabilities System.
First and foremost, I have to make note of the fact that pGPS and PCS are entirely separate entities. I had absolutely no input in Money Puck and Josh’s system, and likewise they had no input in mine. Of course I was inspired by their work – it was an objectively brilliant creation. At the time that it went offline, I was relatively new at Canucks Army. My knowledge of their system came only from reading their literature rather than direct contact. I found the idea incredibly fascinating, and I was gutted by losing it. And so I set out to create my own system, not with the intention of achieving fame or monetizing the system, but simply to satisfy my own curiosity.
With the invaluable help of Canucks Army programmer Dylan Kirkby, I was able to gather statistical data for a variety of leagues dating back decades and began to compile a massive database. Like PCS, the goal was to compare present players with past players based on a few key factors that are known to correlate with NHL success: age, stature and production.
This is where PCS and pGPS likely diverge – while the concepts are similar, it’s likely that the underlying math and formulas are entirely different. There are several steps beyond the basic premise that lead to the final numbers, and any difference in adjustment, scaling, weighting, and so on will yield different results. Without having seen PCS numbers for some time, I cannot comment on the degree to which the final numbers are similar. From here on out, I can only account for what I’ve done personally.
PCS is a tough act to follow, and while there are other projection systems out there, I wholeheartedly believe that PCS was the gold standard. There are models like the Projection Project that use NHLe to compare players. Draft by Numbers uses a Poisson Generalized Additive Model, and measures results in interesting ways like Time on Ice. Another model called DEV uses Euclidean mathematics to measure similarity, like PCS, though with a considerably smaller database. In the interest of giving pGPS its own flavour, I have experimented with different ways of inputting data, measuring outputs, and making adjustments here and there.
The Advantage of pGPS
Though we’re only unveiling it by name today, we’ve been kicking pGPS around behind the scenes for quite some time. In that time, I’ve built the database to a massive size that includes over 150,000 player-seasons from over 25 leagues, dating back over 30 years (in some cases). Of course, goal scoring rates have changed considerably since that time, so every single player-season is rigorously era-adjusted.
As with any projection model, pGPS is simply a tool to be used in addition to traditional scouting applications, rather than in place of them. It makes it far simpler to compare the likelihood of success of players between leagues, and quickly identifies players who are piling up points simply because of their age and size relative to their peers.
Of course, it’s not without its biases. pGPS is subject to biases that are already present within the system. Historically speaking, the likelihood of success for players diminishes greatly the further they get under six feet. But how much of this is a result of their lack of abilities and how much is due to biases from coaches, managers, and scouts that prevent them from even getting a fair chance? Clearly there is a lot of both involved, but we may never know the degree to which each affects the end result. That is why context is critically important.
When a percentage is given by pGPS, it is up to the scout or analyst to determine which side the player will fall on. If a player’s comparable group achieves NHL success 33 per cent of the time, it’s their job to determine whether the player will be among the 33 per cent that makes it, or the 67 per cent that fails, as well as whether there are roadblocks that can be overcome with solid prospect development practices. pGPS percentages can also tell you which players are better bets before you even go about determining development strategies at an individual level.
Components of pGPS
In pGPS, similarity between players is measured by the distance in Euclidean space, where age, stature and production are the three points of imaginary shapes some distance apart. The closer the players are in Euclidean space, the more similar they are. Players with a high degree of similarity are deemed to be compatible matches. Each match is cross-referenced with the NHL’s all-time data, and a series of results are formed. Of all the players deemed a match, how many made it to the NHL? How many forged careers as NHL regulars? How many goals did they score, how many points? Each of these questions boil down to singular statistics designed to project young players in a myriad of ways.
pGPS is measured by a series of different numbers, each of which indicates something different about the relationship between the subject and the historical sample.
- pGPS n: The number of matches between the subject and the player-seasons (one season by a single player, i.e, John Tavares 2008 OHL) in the historical sample.
- pGPS s: The number of statistical matches that became NHL regulars. This is determined by playing 200 NHL games.
- pGPS %: The bread and butter. Simply s divided by n, this is the percentage of statistical matches that successfully became NHL players.
- pGPS P/GP: The NHL points per game of successful matches.
- pGPS R: A bit of a hybrid number, this pGPS Rating combines the percentage and points per game to produce a number that includes both likelihood of success and potential upside.
To assess the capabilities of pGPS, I ran the entire 2007-08 OHL season, using data from season played previously as the comparison sample. I found a high correlation (R^{2} = 035) between the players’ pGPS % and their eventual NHL games played.
Within the 2007-08, pGPS was very impressed with John Tavares and Steven Stamkos, giving them percentages of 100 and 87.5 per cent, respectively. It wasn’t fooled by 19-year old Justin Azevedo, who led the OHL in scoring that year. Azevedo was assigned a pGPS% of 0.0%, and subsequently went on to play zero NHL games.
When the OHL database for that season is arranged by pGPS R, many of the most successful players rise to the top, with 15-going-on-16-year old Taylor Hall at the top of the list, still two years away from being drafted first overall in 2010. Of the top 38 players by pGPS R, only three failed to play a game in the NHL, while 21 players in that group played at least 200 games. Here’s a list of the top 11 in pGPS R:
Obviously there are a couple of whiffs here and then, but by and large pGPS was able to identify eventual NHLers with impressive consistency.
pGPS R was also a very strong indicator of eventual NHL production.
The Future
This is just a small taste of what pGPS has to offer. Over the next several weeks, you’ll see these percentages making appearances in the Nation Network’s Draft Profiles series that go from now until the week of the draft. We’ve already been using this metric in assessing potential free agents, both of the CHL and NCAA variety, and we will continue to do so moving forward.
As the draft gets nearer, we can use it for one of its most valuable benefits – picking out potential late round steals, especially in lesser known leagues. While most North American leagues are standard, and European elite leagues were must haves, the European junior and second tier leagues are new additions to the pGPS family.
pGPS needs more rigorous statistical testing to determine significance and validity. That will be something that I get into as we head into the summer and will be updating the masses on as we go along.
In the future, we’ll experiment with different ways of displaying pGPS data, analysis of drafting by the Canucks and by teams around the National Hockey League, as well as explanations of anomalies and the relationships between the world’s various hockey leagues. Buckle up stats fans, this should be quite an adventure.
Boom!
Looks great, I loved PCS because it gave you a separate perspective on a prospect and how previous prospects with similar stats fared. It’s one piece of the puzzle that has been missing (or very hard to dig up) since it departed, I’m looking very much forward to it being used for this draft!
It all makes sense now…
The reason the quality of the work on this site has been trending downward for the last couple of years is because NHL teams are less inclined to hire just any random basement blogger for $50,000 per year at best and as a free intern at worst.
Dimitri Filipovic is a good example since he could not even explain the gambler’s fallacy to delusional Canuck fans…
He had his chance, was exposed, and is doomed to spend the rest of eternity pondering where it all went wrong.
Rhys, Josh & Cam are the type of talents that legitimately deserve to be pilfered for NHL employment if for no other reason than to gain popularity among the online sheeple.
Perhaps Jeremy is good enough as well…
What the hell is your problem?
The demon is a liar.
He will lie to confuse us.
But he will also mix lies with the truth to attack us.
The attack is psychological, Drance, and powerful…
Serious question, how has NM00 not been hired on yet?
Love it! So glad that something has been developed to fill the void left by PCS. I know it’s unlikely, but are you going to make it public, ala the Projection Project?
This is awesome. Looking forward to seeing pGPS do it’s thing leading up to the draft!
This is good, got to give credit where credit is due.
This takes a long time to do, and good effort and initiative so far.
Very nice.
Good work, im interested to see it in action in the lead up to draft day, and the after draft analysis of JB’s picks.
I did not realize that all of these league logos drove in London UK!
Love that you’re doing this work and that seems like a pretty extensive database to test it out on. I still have questions about one of the three legs of the basic model — namely stature. As with the PCS model while I get why height rather than say weight (since the latter is much more prone to variation and player behavior and control) becomes a proxy for a certain kind of body type I still wonder if that component (rather than age or production) is more a reflection of a deeply embedded set of beliefs regarding size held by the coaching/scouting/management establishment in hockey than it is an actual accurate predictor of future success. The other two components aren’t nearly as vulnerable to confirmation bias in this sense.
At any rate, really curious to see how this plays out in thinking about the upcoming draft, great work.
No one questions the trend to bigger players in the NFL. No one questions it in the NBA. For some reason, bright people continue to question this trend in the NHL. Because of the speed of the NHL game there will be a place for otherworldly skilled outliers, but when comparing two prospects with similar skills, pick the bigger one – always. Just don’t reach for a bigger, less skilled player.
Nah. Changes in direction and speed are too fundamental in hockey.
Big men can changed direction and speed too. What really matters is what a player can do with the puck. If two players are equally efficient with the puck and skill set, choose the one that’s harder to move.
It’s not a slight on the smaller player’s skill, but a natural advantage that the larger player was born with. As you already know, not everyone was born physically equal, and that’s nobody’s fault.
Am I missing something here?
Of the 11 sampled players 7 were top 10 picks.
Of the other 4, 3 have played 41 games or less and one Dustin Jeffrey has 131 games and is by no means a draft find.
What does this tell us?
Im on board with idea and not knocking it I just genuinely dont see what it tells us.
Does it find any late first rounders that turn into high end NHL talent?
Great work Jeremy!
As a summer intern over the years in the NHL, let me tell you that one of the most tedious jobs I have compiling scouting lists from all the notes and ratings I get from a bunch of crust old hockey scouts. Who cares if you “saw him good”. Am I right?
One year, I had a little mishap involving the OHL scouting reports, some coffee, and some cat videos on youtube.. but I digress. The reports were unuseable and I had to come up with something quick. Like every intern I like do as little work as possible, so I came up with a quick and dirty rating system for junior hockey. Basically it works like this:
1) Sort all players by points per game.
2) Exclude any player under 5’10. Hockey isn’t for smurfs.
3) Exclude any undrafted 20 year olds. Already been passed over twice, no one will miss them from the list.
4)Remove any one who didn’t play at least 30 games. I remember from my statistics class that 30 is sort of a magic number from statistics.
I was looking back at my list from 2008, and my top 38 had 19 players that have played 200+ NHL games. Not bad for a few minutes work, eh?
Great work Jeremy!
As a summer intern over the years in the NHL, let me tell you that one of the most tedious jobs I have compiling scouting lists from all the notes and ratings I get from a bunch of crust old hockey scouts. Who cares if you “saw him good”. Am I right?
One year, I had a little mishap involving the OHL scouting reports, some coffee, and some cat videos on youtube.. but I digress. The reports were unuseable and I had to come up with something quick. Like every intern I like do as little work as possible, so I came up with a quick and dirty rating system for junior hockey. Basically it works like this:
1) Sort all players by points per game.
2) Exclude any player under 5’10. Hockey isn’t for smurfs.
3) Exclude any undrafted 20 year olds. Already been passed over twice, no one will miss them from the list.
4)Remove any one who didn’t play at least 30 games. I remember from my statistics class that 30 is sort of a magic number from statistics.
I was looking back at my list from 2008, and my top 38 had 19 players that have played 200+ NHL games. Not bad for a few minutes work, eh?
God bless you.
This reminds me of Halt & Catch Fire when they got Cameron to develop a programming language after dragging Cardiff into the personal computing game by reverse engineering an IBM PC.
In other words, good stuff!!
Looks like pGPS will be an interesting tool. I for one am looking forward to the results.
I will be curious to know how pGPS rates Adam Mascherin.
I found this piece on thehockeywriters.com website.
Adam Mascherin – Draft Rankings: NHL Central Scouting (57) – TSN Craig’s List (34)
Position: Left Wing/ Center
Height: 5’9?
Weight: 200 lbs
Shoots: Left
A former second overall pick in the Ontario Hockey League Draft, Adam Mascherin is widely known for both his on-ice intelligence and unrelenting compete level. In his second season with the Kitchener Rangers, Mascherin notched 35 goals and 81 points, a total which tied him for the team lead and was a drastic improvement from his rookie season totals. Despite his size, or lack thereof, Mascherin is incredibly strong, a quality which allows him to protect the puck from his opponents and release his incredible shot, one that has been regarded as NHL caliber.
Ranked 57th among North American Skaters by Central Scouting and 34th among all Skaters by Craig Button, Mascherin could be scooped up by the Leafs with their natural second round pick, which will likely fall anywhere from 31st to 35th overall, or possibly with the selection they acquired from the Capitals in exchange for Daniel Winnik which will fall likely in the 50-60th overall range.
A bulldog style player with incredible skills and determination, Mascherin is an incredibly mature player for his age whose hockey commitment, both on and off the ice, would be a perfect match for Mike Babcock’s desired style of play.
Couple questions and comments (Also don’t get mad). First that R squared your reporting better be an adjusted R squared. Also .35 is not a high correlation. The more variables you add to your model the higher R squared value (that is unless your reporting a adjusted R squared if this is then say it is so). Also how did you deal with era effects, also how did you deal with players that were drafted after the seventh round or not drafted at all. There seems to be bias built into the model, also I assume you used an estimator that fixed any heteroscedasticity problems you ran into… Also what stat pack did you use for analysis STATA, SAP, or some other? Also is I’d understand you hesitance to post what I am going to ask for next but… could we see the full model your using? Also what specific estimator were you using (non-linear LS?, LS?, 2SLS?, ML?, (Was it a ridge regression model?)?) Also is this testable (statistical sense)? Essentially I’d just like a lot more information before I ask anymore questions. Maybe you posted these things and/or maybe I just suck at finding them. Grad school taught me to be skeptical… If you have any questions for me or want to answer the stuff directly; like why the hell is this guy firing all this stuff at me, email me (I’m assuming contributors(writers) can see my e-mail).
Don’t see how R^2 adjusted is needed there, the R^2 of 0.34 was from a straight linear regression in the plot, so it doesn’t have a lot of variables. And no, they probably won’t share the model, as they didn’t with PCS. It makes sense when you consider that one of the reasons the authors of PCS were hired by an NHL team was to get access to the inner workings of the model (and probably have them continue to make more adjustments and other models… my understanding anyway).
That said, I would like to propose a better way of evaluating your pGPS metric. Since it outputs a probability of success for each individual, and you are interested in whether these players achieved success or not, you can evaluate it as follows:
Sum up all the probabilities for all players in the 2008 OHL season. This sum will give you an expected number of players that will make the NHL. Then look at how many players actually made the NHL as your ‘observed’ value. If observed is much greater than expected, your model is underestimating probabilities. If observed is less than expected, your model is overestimating success. This is a probability distribution called the Poisson binomial distribution, which you can read about here:
https://en.wikipedia.org/wiki/Poisson_binomial_distribution
My gut, just from looking at the figures, is that your model is somewhat systematically over-estimating the probability that many players will make the NHL (there seem to be a lot of dots along the left side of the plot, but hard to say because they overlap so much). I could be wrong, but would be interested in seeing this metric.
Be very skeptical about statistical models people share with the public for free.
The best information is within the confines of NHL front offices where skilled employees are compensated appropriately.
Also what was your population and sample sizes your were/are working with? Are they representative? I’ll be quiet now
Off topic and quite random, but why does an article looking at prospect graduation have St. Petersburg of all cities as the map in the cover picture?
This is very interesting, but it reminds me of NHLe in that it sould produce some very skewed results. You don’t want to start throwing it in as the basis of every analysis when it’s still very flawed. Otherwise it raises more questions than answers and weakens the overall message.
Gee, I was hoping for a simpler bottom line. Such as what parameters have been under-appreciated for players that performed better than their draft position. And conversely the parameters that were over-valued for draft flops. Parameters that can be measured such as size (height & weight), skating, scoring and subjective rankings that can be eyeballed but not measured, like “compete level” that scouts are prone to use.
Someone please contact Benning and Linden and get them to pay attention to this.
Do you actually think I run my empire based on the whims of the casual fan?
Our proprietary information is galaxies ahead of the publicly available stuff you lap up.
We have three analysts whose primary function is to search the internet for useful information in the public domain.
In addition, we employ eight analysts that make six figure salaries.
All have graduated from some of the top universities in Canada, England and the United States.
You probably watched Good Will Hunting and thought Matt Damon played a convincing genius.
Yet, you are here.
Don’t rip on the casual fan for being just that, a casual fan. That’s like writing you off for just being you, which is not cool. If you are some big wig, you need to chill out and let fans be just that, fans. If you are too sensitive to fan critiques, do what the players do, and stay away.
I’m guessing this is a teaser of sort. I presume this program has been offered to NHL teams ( or maybe most teams already have similar models already ? )but no one has bite hence it appears on Canuck Army. I can’t even imagine the volume of work that goes into it but I suppose it will be refined as time passes.
I look forward to further results
Thks
@TC thanks man, to build off your idea maybe a Bayesian econometric model should be used. Also the r squared value is s*** (errors f’d up can’t use r squared) then because the data they would have been using would have been heteroscedastic in nature. Using a straight linear regression would violate some of the most basic assumptions needed for this model to be a valid and used for analysis. Also besides being biased the model lacks consistency (Biased and inconsistent equals no good).
Hell if I were attempting something like this I’d probably start with using an IV and see what the results were, then try to use a Bayesian 2 stage (Bayesian and classical approaches to instrumental
variable regression(read it(from the journal of econometrics))) and look at the results.
Sports and Econometrics equals fun!!
Or Just use structural equation modelling(probably the best bet(tho have to acknowledge lack of goodness of fit analysis/error analysis)
On a related note, for people who are like what the hell is he talking about read the following:
Start with
“Statistics for dummies” and “Econometrics for dummies”
then move on to
“Introductory Econometrics: A Modern Approach”, by: wooldridge
then move onto Greene’s “Econometric Analysis”
and for a good forecasting intro book read diebold’s “Elements of forecasting”
Happy cinco de mayo off to drink some tequila!!
In line for when I wrote anaylsis I meant inference