Journal of Hockey Analytics: Volume I Issue 3


Courtesy of @GoalieWays

Welcome back to this weeks Journal of Hockey Analytics to help with all your boring Mondays at work, and all your future hockey analytics research.  In this week we look at how simple drafting rules can beat scouts, reviewing some raw season long zone entry data and even some junior level analytics to analyze Griffin Reinhart.

For all your links, continue past the jump!

I won’t spend too long on this but over the weekend it was learned that Tore Purdy, aka JLikens of Objective NHL had passed away. He was one of the pioneers of Hockey Analytics who had done great initial work building up the ideas that we use in our day to day work now. Many people on twitter have been paying their respects and summary of the situation have been quite well written by The Edmonton Journal, Hockey Buzz and Puck Prediction

On to this week’s analytics work:

  • The most interesting article of the week came from Rhys J who compares the results of the Vancouver Canucks scouting staff over the last decade compared to extremely simple drafting rules.  The results are very surprising [Canucks Army]
  • Cam Charron explains why the simple rules can beat scouts and gives some interesting opinions on the subject [Canucks Army]
  • Unhappy with the method that Rhys use, Daniel Wagner re-ran the experiment but this time making his selections based on the CSS rankings [Pass it to Bulis]
  • Garik of Lighthouse hockey watched every NY Islanders game of the season and has released his data on the Islanders Neutral Zone play. He has made his raw data available here for your future work if interested. [Lighthouse Hockey]
  • Similarly, Sens fan Manny has released similar data in raw google spreadsheet format for the Ottawa Senators. It includes zone entry stats, raw game data with every 5v5 in/out of zone entry/exit and game stats for team and players and season total. The info includes individual and on-ice player stats including one time shares and avg. time in zone per entry of both types. A lot of hard work was done by him. [Google Docs]
  • CanesAndBluesFan looks at how corsi of players changes with age [St. Louis Game Time]
  • Adam Gretz takes a look at how much of the salary cap each team should spend on their stars []
  • Rob Vollman analyzes the career of Teemu Selanne [Bleacher Report]
  • Last week I posted a link to an academic study on the “Hot Hand”.  Eric T pointed out this counter article on the subject. Check it out – always worth looking at opposing pieces [Sabermetric Research]
  • The Hockey News released and article on the usage of visors varies within the age and birthplace of visors [The Hockey News]
  • Continuing the debate from last week Ryan Lambert weighs in on if Jonathan Toews is better than Sidney Crosby [Yahoo Sports]
  • Megan writes about Zone Entries with stats she tracked during the WHL Portland/Edmonton series [Shinny Stats]
  • The UBC author who recently wrote that NHL players peak at 29, using +/-, thinks Marty St. Louis is an exception to the rule [NY Times]
  • In some academic work, Mills et Rosentraub look at the NHL and cross-border fandom [Journal of Sports Economics]
  • Justin Bourne analyzes the systems that the Rangers are using to beat the Montreal Canadians [The Score]
  • Tyler Dellow analyzes the Habs chances in their series against the Rangers when starting Tokarski [Sports Net]
  • Travis Yost looked at usage of players by age [Sporting News]
  • Yost (again) also looks at why Chris Stewart would be a terrible trade target [Hockey Buzz]
  • Nick Emptage looks at what we can predict and what we cannot [Puck Prediction]
  • Eric Tulsky shows why you shouldnt use small sample sizes to analyze players [Outnumbered]
  • Neil Paine looks at which NHL goalie has been the hottest during these play offs [FIveThirtyEight]
  • Garik uses some CHL fancystats to try and predict what type of player Griffin Reinhart may be (if at all) [Lighthouse Hockey]
  • Eric Tulsky looks at neutral zone play and why Chicago is having difficulty against L.A. [FiveThirtyEight]
  • While Sean Mcindoe is more known to be a comedy writer, he has done a great job in combining analytics with humour. Most recently he looks at how long the Kings/Blackhawks can sustain their strong core [Grantland]
  • Eric Tulsky analyzes scouts and veteran players [Outnumbered]
  • An AMA was run with Andy Andres who is running an SQL/R/Sabermetrics MOOC [Reddit]
  • Phil Curry claims that possession is 3/4 of the playoffs [DoHA]
  • I use some fancystats in the American Hockey League to try and analyze their conference finals as well as review the Calder Cup quarter finals [NHL Numbers]
  • Remember the racist tweets that came out of Boston when Subban scored? Bill Speros analyzed the volume of those tweets looking at original work vs. retweets []
  • Travis Yost analyzed playoff Corsi% and the Conn Smythe Trophy [Hockey Buzz]
  • Anton Stralman is touted as a really good player putting up great corsi but never scores [mc79hockey]
  • Looking at basketball (the same ideas can apply to hockey) Nate Silver looks at when you should sign a basketball player to a max contract [FiveThiryEight]
  • This isn’t so much a problem with the online hockey analytics community, but rather with the secret statistics offices. The willingness to share research data Is related to the strength of the evidence and the quality of reporting of statistical results [PLOS|one]
  • The Montreal Canadians use numbers to show that Galchenyuk has been “clutch” within the playoffs []
  • Garret Hohl asks if you should judge Dustin Byfuglien on his Corsi% or his Goal Differential [Arctic Ice Hockey]
  • Scott Cullen uses some analytics in his piece on the Islanders signing Halak, nice to see those being used in main-stream media pieces [TSN]
  • Byron Bader analyzes how teams have done at drafting through the different rounds over the year [Flames Nation]

I don’t typically want to link to opinion pieces since a lot of the arguments repeat themselves every time and little new comes out of it. There are a few opinion pieces that kept coming up this week so I decided to include them.

  • While not talking about hockey, but rather the use of fans and stats within baseball, Bob Ryan makes the argument that the average fan just doesn’t care about stats. It has some good carry over to hockey [Boston Globe]
  • A large number of responses, from various sources, were written to address the Bob Ryan piece.
  • In hockey, another piece by Steve Simmons wrote about how fancystats just don’t make sense to him []
  • Much like Bob Ryan’s article, this provoked responses from everyone and their dog.
  • Eric Fingerhut analyzes a Washington Post article by Neil Greenburg on the interest in fancystats and sports [The Fingerman]
  • These anti-stat columns have been around for as long as stats have been around. This historical piece from the 50s in Sports Illustrated was writing about how stats were ruining baseball [Sports Illustrated]

I will be away for work this summer in Europe and thus with the time change it will make things difficult for me to keep up to date with twitter. If you write anything about hockey analytics, or you have seen anything interesting please send it my way so I can keep these updated throughout the summer!

      • SmellOfVictory

        Very true. They should’ve gone with just the highest-scoring forward taken after each pick, rather than between the two picks.

        I think a combination of points as well as ranking would probably have produced results close to what they got with their psychic potato.

        • Parallex

          For giggles I went and made my own fake scouting director for Vancouver (since that was the team in question)… I don’t have that much time on my hands to do all the years but I did a redraft of 2000 with my own fake intern. I call him “Lazy Cheapskate” his marching orders were thus… we’re cheap and don’t want to spend any money on in-house scouts so we’re just going to use CSS rankings of N.A. skaters (because goalies are voodoo and we’re also ethnocentric and don’t want guys from Euro leagues) as our list. The results…

          Brad Boyes 
          Brett Nowak 
          Lou Dickenson 
          Matthew Lombardi 
          John Eichelberger 
          Eric Johansson 
          Andrew Downing 

          … assuming I didn’t mess up when scanning for names, and I don’t know if it would hold up once you include the other years but Lazy Cheapskate clearly beats Delorme at least in year one.

      • SmellOfVictory

        Thanks for posting that. I like CA more than most hockey blogs but between the total lack of moderation to get rid of the conspiracy theorists and juvenile trolls and the idiotic doubling down on this “simple drafting method” it’s been a little sad.

    • Derzie

      I’d like to to know what teams beat the potato. The Nucks are low hanging fruit as they have the worst drafting record of anybody. What about the Detroits and the California teams. We know how Sutter did 🙁

      • SmellOfVictory

        You know, the most interesting man in the world, the Dos Equis man once said when it comes to success, ” Find the things in life that you’re not good at…and then stop doing those things.”

        The Canucks are not good at doing almost everything. So it should be advised that they should stop doing whatever the hell they’ve been doing in the last 44 years.

        but that would be like trying to convince Americans that they should eat more vegetables.

    • Parallex

      The flaw in the Sham model is really too bad, because it undermines what I think is a very good point.

      I also think the flaw would be pretty easy to correct. A rule like “of the next 15 players on the ISS NA rankings, select the one with the most points per game” would probably produce solid results. And that’s before you get into other pretty simple measures like controlling for scoring levels across the CHL leagues and mixing in defensemen.

      • mattyc

        I suspect that even if you modified Sham, the results will not be particularly good IF one runs the simulation for all 30 teams independently.

        The majority of top scoring junior forwards that become quality NHLers are drafted in the first place.

        Just look at some older drafts after the 1st round and you’ll see from where the “steals” often come: Europe, defenseman & goalies.

        Sham beating one team during an arbitrarily selected time period is worthless.

        Sham would need to beat at least half the teams in the NHL to be taken seriously…

        • Parallex

          Except running it for all 30 teams would completely defeat the point. If you’re looking for an edge over the other 29 teams, you’re going to do something differently from them, pretty much by definition.

          The 30 teams argument is a non-sequitur.

          • mattyc

            No I mean run an “all else being equal” simulation for all 30 teams independently.

            For example, let’s just say that Sham was going to take the highest scoring draft eligible CHL forward remaining on the board.

            So when Sham is pretending to be the Canucks GM, he simply selects the highest scoring junior forward available.

            And when Sham is pretending to be the Predators GM, he selects the highest scoring junior forward available.

            That would mean no Jones, Ellis, Josi, Weber, suter & Hamhuis among others…

            And when Sham is pretending to be the Red Wings GM, he selects the highest scoring junior forward available.

            That would mean no Datsyuk, Zetterberg, Franzen. The overwhelming majority of gems that the organization has found in the last couple of decades.

            Can Sham beat more than a handful of the 30 NHL teams?

            Because if he can’t, it’s hardly evidence that a simple formula can beat a complex scouting system or some other such nonsense…

            • mattyc

              I see what you mean.

              First, I suspect the Sham model would beat more than a handful of teams.

              Secondly, obviously the European and defensemen issues are why no one would ever seriously consider adopting Sham outright. The D issue, though, would be relatively easy to address – just value each defenseman point as worth, say, 1.3 of a forward point. (In other words, a d-man who scored 60 points would get credit for 78 points, levelling the playing field between forwards and D. 1.3 might not be the right number, but you get the idea.)

              Finally, make the simple change I suggested above, and allow the model to select from a range on the CSS rankings.

              I’d be willing to bet that a model like that would outperform a large portion of the league.

            • mattyc

              Define large.

              More than 15 teams?

              I suspect that part of the appeal of Sham is that CHL scoring forwards are (presumably) the easiest thing to predict.

              I’m not sure that Sham wants to touch Europe, defenseman of goalies and be (further) exposed as a fraud…

            • mattyc

              More than 15? No idea for the Sham model, which was basically cheating anyways.

              But yeah, I’m fairly confident that a slightly more rigorous, but still basically simple system could top 15 teams.

              I’ve never seen any in depth statistical analysis of drafting from Europe, or US high schools or the USNDTP, but would be interested to see it attempted. You definitely have to rely much more heavily on scouting for those areas.

            • Parallex

              “I’m fairly confident that a slightly more rigorous, but still basically simple system could top 15 teams.”

              I’m skeptical.

              Not because teams don’t miss a lot (they absolutely do) for some seemingly foolish reasons (such as gritty selections in the Bryan Allen/Luke Schenn mold).

              Mostly because, on the whole, I suspect teams do well enough on the easier-to-predict players that tear up junior hockey and, even with the misses, make up for it with the harder to predict gems taken later.

              That’s where I suspect teams would beat a system.

              Though I’d love to see something that changed my thinking on this…

            • Actually,
              pop me an email Jamie.. I’d be up for collaborating.
              (anyone else that wants to help out can too)

              I’ve been talking to Canucks management, and this would be an interesting thing to show them.

        • mattyc

          And of course I should add that no one is suggesting the Canucks or any other team should adapt the Sham model as their exclusive drafting tool.

          I read the post as a thought experiment meant to illustrate that 1) the Canucks drafting has been very poor, and 2) statistics are an under-utilized drafting tool.

          • mattyc

            Was it really that effective a thought experiment though? Was it revelatory that the Canucks drafting has been historically bad? I think it would have been an interesting exercise to me if it showed a method that said “here’s who we missed because of our reliance on guesswork and hunches”. Statistics are already utilized in scouting and drafting, just not advanced ones and not particularly well. I’ve always hated the hunches — let’s take Libor Polasek because he’s huge or Antoski because he fought Lindros to a standstill or Patrick White because he was a finalist for the Minnesota Mr. Hockey award or Honzik because…no idea. But this whole exercise still seems lazy to me. Critiquing the current system means more than just assuming that all current scouts do is play a hunch. That is setting up as much of a straw man as saying that all analytics can do is provide a sterile numbers-based approach.

            • mattyc

              Sure, I mean, as I said I thought the flaws in the method were unfortunate and undermined the larger point.

              But I still think the concept of applying a basic drafting model to the Canucks, and showing that it can greatly surpass their actual record, can be very helpful in showing just how flawed the team’s drafting record has been.

            • mattyc

              You know I think the actual tongue-in-cheek part of the original post — that the Canucks or other NHL team drafting is so haphazard or flawed that you might as well use a nonsensical method — is a good one. But I think it’s being taken up quite literally and seriously and to do that I think it’s not only that the “method” is problematic, it’s that there has to be a much more rigorous critique of the existing system(s) to establish a baseline against which we can compare the actual draft records if we’re going to make this make any sense. I agree that there’s value to this but we can’t start off with the assumption that the only way that teams currently draft is on the basis of gut instinct.

              I think teams have all kinds of strategies going into drafts and they have different methods of valuing player potential. Right now we’re talking about drafting outcomes in the most basic (and I’d argue the most useless) of ways — not only to look only at a small group of scoring forwards from only one of the available pools tells us very little. And focusing only on points tells us nothing about the various skills that you need to actually ice a competitive team. Adjusting the defense value by 1.3 as someone suggested does nothing to evaluate the potential for a shut-down defenseman. Basing the potential on age 17 points also tells us nothing about growth spurts or skill development of late bloomers like Tanev or Lack, for example. Using the Sham method to me leaves you with a team like the Oilers and we know how successful they’ve been at collecting shiny point producers.

              I think a lot of team are terrible at drafting. But it’s not necessarily because they are doing everything on gut instinct. They might be like Gillis and drafting according to a certain rhyme (with little reason). They might be sticking to a particular pipeline they like. It would be good to know what needs to be replaced rather than coming up with a whole new method that might be just as half-baked. I mean I could say that tossing all the draft-eligible names into a hat and picking that way would be better than the draft records of many a team.

            • mattyc

              To be honest, I actually don’t think the Canucks’ drafting from 1999 – 2006 (or even 1998 – 2007) was particularly bad (relative to 29 other teams) considering the draft position and the asterisk that has to be put next to Bourdon (RIP).

              By and large this was the foundation for the best Canucks team ever and there were quite a few Canuck selections that COULD have been part of the 2011 team:

              Allen (obviously glad he was traded, though), Umberger, Bourdon & Grabner.

              Along with Sedin, Sedin, Bieksa, Kesler, Schneider, Edler, Hansen & Raymond.

              Things would have been quite a bit different if Sham was the GM in 1999 and had selected Brendl & Sparikyn or whomever…

          • mattyc

            Except that the reason you believe the Canucks drafting has been poor is largely based on some fudged numbers and hindsight drafting that Rhys used to beat the Canucks during an arbitrary time period.

            Beating 1 of 30 teams is entirely meaningless even if Sham used legit stats and overcame his dependence on hindsight.

            Beat 15+ teams and it would be a worthy discussion…

          • Parallex

            Except that the rudimentary system is worthless if it cannot beat at least half the teams in the NHL.

            Beating 1 of 30 teams is completely meaningless.

            Would a non-hindsight, modified version of Sham have beaten Nashville, Ottawa & Detroit, for example?

            You have to understand that any “system” has a chance of beating a handful of the 30 NHL teams.

            So what?

            Show me a system that can, at minimum, beat 15 of the 30 NHL teams and then it would be something to consider…

            • mattyc

              No one is arguing for the “rudimentary system” to be used though. Its worth nothing, which is kinda the point (its funny), as it beat Delorne’s picks.

              Or am I missing something?

            • mattyc

              The system only beat the picks made during the Delorme era (not necessarily Delorme’s picks, mind you) based on fudged numbers and hindsight.

              It tells us nothing about a simple statistical method beating a complex scouting system or whatever Cam was illogically going on about…

        • Parallex

          Right, but the PitB model JUST used the CSS rankings, which I think are prone to a lot of different biases, especially towards big players.

          My proposal was basically a mash-up of Sham and the CSS. At each spot, look at the next 15 players on the CSS rankings, and take the one with the most points per game. So for example if the top 5 CSS players were gone, you’d pick the highest scoring player from spots 6-20.

          I really think a basic system like that (with tweaks here and there) would produce really strong results.

          • mattyc

            This would certainly be interesting.

            Though, again, beating the Canucks (if your proposed system did, in fact, beat the Canucks) wouldn’t tell us anything.

            What if, for example, 23 of 30 teams were better off with their own methods as opposed to your strategy?

            The only way to find out, of course, is to look at how a given strategy would have helped/hurt each of the 30 teams if they were to have used Sham over their own strategy…

    • Spiel

      @NM00, Jamie, & others: Great discussion! This is really interesting. I still think it’s important to say that Rhys’ original point is useful because it brought up a baseline to evaluate your scouting, not replace it. Very valid criticisms now shown by PITB and Churko. I like the idea of some points/CSS/ISS mash-up for the baseline.

      I think one point that you need to add to your evaluation is one that the Bourdon (RIP) and maybe Sauve picks bring up: luck and draft position. Meaning: whatever system you come up with, you have to run it on 30 teams and then come up with a standard deviation of games played or points, or some other metric. Over the last 13 drafts, what is the average GP/points a team has gotten out of its drafting, and what was expected based on their draft position. Then you can know if some scouting group is really underperforming.

      Btw, in evaluating the NHL careers of draft picks, it would probably be best to go with TOI, somehow normalizing between dmen and forwards. Or at least a composite metric of TOI, points, and GP. Someone once noted that coaches are pretty reliable experts on players, and who they give ice-time to on the aggregate is a good indication of the talent of those players.

      • Spiel

        While I agree with most of what you say, what was Rhys’ original point?

        And what did cheating at the draft have to do with making this point?

        And was this “point” something thought provoking or something many of us already suspected?

        I’m never going to agree with the idea that Delorme (the Canucks in the Delorme era, really) should be expected to beat Sham.

        That was always parochial & absurd…

    • Spiel

      For all the criticism of the Canucks scouts, I would like to see the CA staff call their shot on this upcoming draft.

      Its pretty simple. On or before the draft come up with CA’s ranked list of players using whatever rudimentary method they decide.
      Then after the draft, take each Canucks pick and say who the Canucks should have taken instead.

      Then keep a running tally of the relevant stats or progress to compare each set of selections.

      Maybe make it even more interesting by agreeing to make an annual donation of $1 to the Canuck for Kids fund for every extra game played by your picks versus the Canucks picks for that season.

      Seems like material to fill articles for years to come. You’re welcome.

    • mattyc


      Rhys’ original point is that some algorithm that is purposefully built to be a bad evaluator, or only take one thing into consideration, should not beat a paid team of scouts, along with the GM’s input. A baseline to see whether scouts+GM are looking for the right thing or only coke machines that hopefully can skate. It’s okay to suspect that the Canucks have been below average at drafting, but it’s entirely another thing to figure out how to show it with data, instead of just believing in the narrative.

      Now, Rhys’ original algorithm has been smartly criticized for sneaking in scouting without admitting it. Fair enough. A better algorithm against expected performance would offer a better baseline.

      Rhys’ choice of 17 year old scoring is based on a couple of assumptions that he’s written about before: players that are good at scoring are probably good at a lot of the aspects of the game. (Remember, this is one of his concerns with the Horvat pick.) The second assumption is one that Cam Charron also talked about in his follow up post. GMs/scouts *might* put too much stock in “good character” plus a bunch of measurables like size and speed. They’re asking, are NHL teams over-valuing these and not valuing enough how well the player plays?

      These are really good questions. Nothing absurd about the intent, even if you disagree with the specifics.

      Thought-provoking? Rhys did a good job of bringing up his points. His post has already led to some good discussion here, and two follow up posts by other Canucks’ bloggers. There’s a bunch more analytic bloggers outside of Canucksland who have read it, and who knows, maybe will follow up with some good better algorithms. And, judging by twitter, Rhys and Josh themselves are planning a follow up with a more complex algorithm in response to the criticism.

      So, the proof is in the pudding: Rhys was thought-provoking enough to provoke a lot of good discussion.

      • mattyc

        “Rhys’ original point is that some algorithm that is purposefully built to be a bad evaluator, or only take one thing into consideration, should not beat a paid team of scouts, along with the GM’s input.”


        Flaws in the method aside, is it really surprising that a system (any system) can beat 1 of 30 teams?

        While I agree that a lot of good discussion has come out of this, by and large it has been based on criticisms of the methodology.

        That does not necessarily make it thought provoking…

    • Parallex

      “is it really surprising that a system (any system) can beat 1 of 30 teams?

      Well… yes. How much do you imagine a NHL team spends on scouting? Let’s say $1,000,000.00 per year, seems reasonable when you take into account salaries and expenses, now take that number and multiple it by 10 (for a ten year period… 2000-2009 since those are the years that you could expect to see results from) that’s $10,000,000.00 total spent… if your investing ten million dollars in something it should be able to beat a system that costs $0.00. Sure any system could get lucky and beat out a team over a single draft but 10 drafts? That shouldn’t happen.