# Why a simple statistical method is more successful than a complex scouting program

Cam Charron
May 21 2014 11:10AM

If you haven't read Rhys' dirty ditty exposing the Vancouver Canucks scouting office for being, among other things, complete frauds, then go ahead and do that. It was interesting to me, because having read Thinking Fast and Slow by Daniel Kahneman, the Nobel Prize-winning psychologist actually has a full two chapters dedicated to the benefits of simple statistical measures when attempting to assess performance.

The thing that Rhys' piece exposes is that for the longest time, people in the game of hockey are determined to believe that hockey is a more complex game than it actually is. Winning teams, though, score more goals than the opposition. It is impossible to determine how many goals a player prevented, so counting how many goals the player contributed to the cause is basically half of our equation. However scouts try to add to the equation usually bogs down the process.

Kahneman, like another of my favourite thinkers, Nassim Nicholas Taleb, is very skeptical of experts. He won the Nobel Prize in economics, which is notable, because he's not an economist. His research done on biases and the way people think is not only fascinating, but has a lot of useful market applications.

In his book, Kahneman quotes the study of a character named Paul Meehl, who performed several studies to determine whether "experts" in a certain field were better off at predicting future success than a simple formula.

In a typical study, trained counselors predicted the grades of freshmen at the end of the school year. The counselors interviewed each student for forty-five minutes. They also had access to high school grades, several aptitude tests, and a four-page personal statement. The statistical algorithm used only a fraction of this information: high school grades and one aptitude test. Nevertheless, the formula was more accurate than 11 of the 14 counselors.

[snip]

The range of predicted outcomes has expanded to cover medical variables such as the longevity of cancer patients, the length of hospital stays, the diagnosis of cardiac disease, and the susceptibility of babies to sudden infant death syndrome; economic measures such as the prospects of success for new businesses, questions of interest to government agencies, including assessments of the suitability of foster parents, the odds of recidivism among juvenile offenders, and the likelihood of other forms of violent behaviour; and miscellaneous outcomes such as the evaluation of scientific presentations, the winners of football games, and the future prices of Bordeaux wine. Each of these domains entails a significant degree of uncertainty and unpredictability. We describe them as "low-validity environments." In every case, the accuracy of experts was matched or exceeded by a simple algorithm.

It isn't just the Vancouver Canucks who are susceptible to this. The world is beginning to realize the importance of data, but it's important not to be paralyzed by analysis. When evaluating a young junior hockey player, scouts like to weight the player's proficiency in both the offensive and defensive zone, how big he is, where he was born, how hard he works (or appears to work), the quality of his skating stride, the quality of his shot, how much his teammates respect him on the bench, and even submit the poor kid to an interview. In an effort to prove their usefulness by identifying the parts of the game that the common fan couldn't see, scouts get out-performed by statistics, year, after year, after year, after year.

There's an issue with resistance to the obvious choice. As Kahneman notes: "Several studies have shown that human decision makers are inferior to a prediction formula even when they are given the score suggested by the formula!" Hidden in plain sight.

Kahneman also writes:

Facts that challenge basic assumptions—and thereby threaten people's livelihood and self-esteem—are simply not absorbed. The mind does not digest them. This is particularly true of statistical studeies of performance, which provide base-rate information that people generally ignore when it clashes with their personal impressions from experience."

In other words, "have any of you nerds ever even PLAYED the game?"

Hockey is fun to watch. It's why we do it. We wouldn't be doing any of this if we didn't like watching hockey and the speed and the skill and the personalities that go along with it. It's entertaining television. I have to admit I feel for anybody who likes to tell me that they rely on "watching the game" for analysis. I picture them judiciously compiling mental notes while their friends around them are drinking beer and having a good time. It's as if there's a higher purpose to this whole experiment, which is sort of silly. We spend dozens of hours a week caring about a game played by people we don't know and will never know, for no reason other than it's fun. The best hockey writers aren't the ones who provide the best analysis, but the self-aware writers who can still contextualize the game in the realm of "fun".

But that doesn't mean we can't glean lessons from it. People inside the Vancouver Canucks have not fared as well as my diabolical evil twin Sham Sharron in predicting the future success of NHL players, and I would hardly doubt it stops there. I would also caution current scouts and managers that your cognitive abilities are only as good as their inputs, and you aren't as objective as you think you are.

Cam Charron is a BC hockey fan that writes about hockey on many different websites including this one.
#51 Spiel
May 22 2014, 10:16AM
nateb123 wrote:

Is what Sham did not essentially like saying that a list like the ISS ranking was used? It only has to identify guys who were viewed as consensus BPA within that 29 pick range. Basically, yes, things like size were taken into account by draft rankings but that doesn't imply Sham talked to scouts or that Vancouver even had a scouting department. He just used listed information available before the draft.

I would say it is close, but not quite. Teams diverge pretty drastically from the "consensus" lists like iss or central scouting after the first round and even within the first round, and Sham getting one decision right makes a big difference to the outcome.

Use the 2000 entry draft as an example. Central scouting has Justin Williams ranked #19 and he has 83 points. But Central scouting also has Yanick Lehoux (92 pts) and Carl Mallette (125 pts) at #35 and #37. Williams was taken in the first round. Mallette and Lehoux were taken in the 3rd and 4th rounds. Neither had an NHL career. Clearly teams knew something about these players that stats and central scouting didn't catch. Sham's method gives him access to the rankings of all teams that he would not have otherwise had.

Replacing Williams with one of these other guys drops 800 games played and 500 points from Sham's total. I wouldn't doubt that Sham still ends up on top, but the difference is likely less. Still not a glowing endorsement of the Canucks drafting acumen.

#52 NM00
May 22 2014, 10:36AM
Cam Charron wrote:

You could. I mean, Rhys' method was imperfect due to the time he had and his computer capabilities. I'm sure if you ranked 17-year-old forwards by CHL points, the record would be even better.

But that's time-consuming, and we've already proved our point here.

Wow.

You actually believe you have made a logical point here about the flaws within NHL front offices.

To be clear, I'm sure there are a number of flaws within the NHL.

But nothing in your piece is useful in regards to sheding light on that issue.

I'm not going to run a simulator for every draft.

But hopefully looking at one draft will show you the swiss cheese hole in your thinking and Rhys' thinking if you were to run the simulator WITHOUT the benefit of hindsight and the work of NHL front offices.

I'll use the 2004 NHL draft since it is 10 years old.

291 players were selected.

http://www.hockeydb.com/ihdb/draft/nhl2004e.html

If NHL front offices were to ignore all talent aside from draft eligible CHL forwards, here is who would be missed among 1st rounders alone:

Ovechkin, Malkin, Wheeler, Smid, Stafford, Radulov, Zajac, Meszaros, Schneider & Green.

Along with Booth, Grossmann, Goligoski, Krejci, Sekera, Emelin, Regin, Edler, Franzen, Porter, GRABOVSKI, Santorelli, Polak, Hunwick, Campoli, Rinne, Streit, Winnik, Clitsome & Hansen.

For argument's sake (you are free to count the CHL forwards for an accurate number), let's say that 146 of the 291 players drafted were CHL forwards.

Virtually every single one of these 146 forwards would have been amongst the top 146 draft eligble scoring forwards in the CHL.

Implicit in your "I'm sure if you ranked 17-year-old forwards by CHL points, the record would be even better" absurdity is that you would (pretty much) put up CHL draft eligible scoring forwards from #146 - 291 against ALL goalies, ALL defenseman and non-CHL forwards in your competition against NHL front offices.

The talent pool on Sham's team would get destroyed by NHL front offices and it wouldn't even be particularly close...

#53 nateb123
May 22 2014, 11:43AM
Spiel wrote:

I would say it is close, but not quite. Teams diverge pretty drastically from the "consensus" lists like iss or central scouting after the first round and even within the first round, and Sham getting one decision right makes a big difference to the outcome.

Use the 2000 entry draft as an example. Central scouting has Justin Williams ranked #19 and he has 83 points. But Central scouting also has Yanick Lehoux (92 pts) and Carl Mallette (125 pts) at #35 and #37. Williams was taken in the first round. Mallette and Lehoux were taken in the 3rd and 4th rounds. Neither had an NHL career. Clearly teams knew something about these players that stats and central scouting didn't catch. Sham's method gives him access to the rankings of all teams that he would not have otherwise had.

Replacing Williams with one of these other guys drops 800 games played and 500 points from Sham's total. I wouldn't doubt that Sham still ends up on top, but the difference is likely less. Still not a glowing endorsement of the Canucks drafting acumen.

I don't disagree. Sham is hardly a genius after all, but I believe his method was a proxy for using central scouting lists. His approach just happened to save time. However, I wouldn't be surprised if picking the top point producer from central scouting's list BUT only to a certain point (say from the top 15 possible picks) would yield similar results.

I would love to see if the simulation was run again considering only those on central scouting lists within a certain range.

I also wonder how robust these rules are, to see if the model is on the right track or purely lucky (as your Justin Williams example attempts to point out). By varying the parameters slightly (ie picking among a slightly larger or smaller amount of players each selection) and seeing if the results change wildly. If they do, then Sham is just the beneficiary of beginners luck.

We're just giving CanucksArmy suggestions for new features now. I suspect royalties shall ensue haha

#54 nateb123
May 22 2014, 11:48AM

@NM00

If I facepalm any harder, I'll have a concussion.

This is an "all else being equal" approach. That's the empirical method. Running the sim using Sham's rules for all 30 teams would be like changing EVERY variable and then trying to discern what the underlying equation is. That's not how it's done. I don't know what else to say to you except "get an education and stop wasting our time".

#55 antro
May 22 2014, 12:07PM

@Spiel: great and very reasonable discussion.

@NM00:

Several of us have already said that Sham is not a model for drafting. It's a baseline, and a deliberately bad one! If the Canucks scouts using all leagues, etc., had been able to find better forwards, using metrics like GP and Pts, then you would have a point. But they didn't. And again, they didn't even if you include the Dmen and goalies that the Canucks did choose into Sham's selections.

There's only certain kinds of data available for prospects. I agree with you that the results aren't conclusive, but they are suggestive. An interesting first step, to repeat myself. I'd love to see better stuff, so the invitation to put up still stands.

#56 NM00
May 22 2014, 12:22PM

@nateb123

Sigh.

You should understand that ANY method, such as your previous "proxy for using central scouting lists" has a chance to beat a FEW of the 30 teams in the NHL.

In no way does that validate "why a simple statistical method is more successful than a complex scouting program".

Even though the statistical method is highly flawed and dependent on the same people that Cam is belittling...

Even if Rhys adjusted the flaw in his design so that it was not dependent on hindsight, it cannot be taken seriously without looking at ALL 30 teams.

Consider the example that Spiel provides as well as what would have happened in the Sedin draft using the "highest scoring forward in the CHL" method.

"get an education and stop wasting our time".

Thank you for including the quotes for me...

#57 NM00
May 22 2014, 12:44PM

@antro

"If the Canucks scouts using all leagues, etc., had been able to find better forwards, using metrics like GP and Pts, then you would have a point. But they didn't. And again, they didn't even if you include the Dmen and goalies that the Canucks did choose into Sham's selections."

For starters, they did in 1999 as another commenter pointed out (Sedin, Sedin vs Brendl & Sparikyn IIRC).

Just because this arbitrarily begins in 2000 (the Delorme stuff can't be taken seriously without access to his board) in no means validates this simulation.

Also, you absolutely cannot take the results of Rhys' simulation seriously based on criterion #4.

IF he adjusted this so that Sham simply took the best scoring CHL forward left on the board every time the Canucks made a draft pick, then that would be a start.

Not nearly good enough, mind you, as more than 1 of the 30 teams in the NHL would need to be examined to take this seriously.

For example, if a slightly altered, hindsight free version of Sham beat 7 of 30 NHL teams, what would be your opinion of this model?

Using hindsight, I'm sure a few teams every single year (Canucks in 2007 being one example) would have been better off simply taking the highest player remaining on a publicly available ranking system.

That has nothing to do with "a simple statistical method is more successful than a complex scouting program".

Now if Rhys or anyone else has found a system that can beat more than 15 of the 30 NHL teams at the draft table, THAT would be something...

#58 nateb123
May 22 2014, 02:14PM
NM00 wrote:

Sigh.

You should understand that ANY method, such as your previous "proxy for using central scouting lists" has a chance to beat a FEW of the 30 teams in the NHL.

In no way does that validate "why a simple statistical method is more successful than a complex scouting program".

Even though the statistical method is highly flawed and dependent on the same people that Cam is belittling...

Even if Rhys adjusted the flaw in his design so that it was not dependent on hindsight, it cannot be taken seriously without looking at ALL 30 teams.

Consider the example that Spiel provides as well as what would have happened in the Sedin draft using the "highest scoring forward in the CHL" method.

"get an education and stop wasting our time".

Thank you for including the quotes for me...

If only your mind worked as quickly as your fingers. I agree that other teams should be examined (it would certainly be interesting) but given the current analysis, the title is still perfectly accurate.

As for your fascination with picking out specific examples (namely successful European players) as proof against this model, that is a mind-blowingly stupid approach. You could make the same argument with literally every draft program ever. This is what every person who complains about draft selections does: "Oh you picked X instead of Y, therefore your drafting is crap". For a guy who keeps making claims about the overuse of hindsight in Sham's model, it's hilarious that you're entire counter argument is based on NOTHING BUT hindsight.

#59 NM00
May 22 2014, 02:32PM

@nateb123

"As for your fascination with picking out specific examples (namely successful European players) as proof against this model, that is a mind-blowingly stupid approach."

Not only European players.

Also defenseman, goalies & imports playing in the CHL.

I happened to choose 2004.

You can use any recent draft and you will find the exact same thing.

"You could make the same argument with literally every draft program ever."

Really...

It's surprising that you can't even comprehend that we agree on this point.

If Cam and co want to show that "a simple statistical method is more successful than a complex scouting program", they actually have to, you know, show us that it is more successful.

They haven't even come close to doing so.

All they are doing is limiting the talent pool from which they are making selections which opposing general managers would quite enjoy...

#60 Locky
May 22 2014, 07:13PM
Cam Charron wrote:

You could. I mean, Rhys' method was imperfect due to the time he had and his computer capabilities. I'm sure if you ranked 17-year-old forwards by CHL points, the record would be even better.

But that's time-consuming, and we've already proved our point here.

I don't know if you have though?

I absolutely agree with you that it is likely that CHL points (maybe even league adjusted PPG or something like that) will be better than "expert" scouting. The thrust of Rhys' piece and your article is fine. But you seem to both be using the specific 'simple statistical method' as a strong piece of evidence to support that, when it is inherently flawed. Why not be a bit more rigorous in your proof?

#61 TruthObserver
May 23 2014, 05:58AM
Cam Charron wrote:

You could. I mean, Rhys' method was imperfect due to the time he had and his computer capabilities. I'm sure if you ranked 17-year-old forwards by CHL points, the record would be even better.

But that's time-consuming, and we've already proved our point here.

So instead of trying to actually make meaningful conclusions from peer-tested statistical analysis, you decide to half-ass it and call it "good enough"? Any moron from Hfboards knows that Vancouver has bad scouting. How about actually trying to prove it instead of whatever that mess of an article was?