Overfitting, trends, and small samples

Cam Charron
December 02 2013 01:30PM


The Canucks have not won a game where Kevin Bieksa has scored a goal.

For those of you that followed a certain conversation on Twitter yesterday...

I'm a little annoyed at the focus on "The Canucks are X-0-Z when they've scored a amount of goals, but are 0-Y-Z when they've scored b amount of goals or fewer." You know the one. The problem is that people attribute this to a flaw in the way the team plays games, rather than a simple sampling error.

There are a lot of reasons why records are where they are, and when you're a third of the way through the season, some scenarios haven't presented themselves. It's rare that a team will win a game 1-0 and 2-1, and given the rarity of those events, should it really concern us that the Canucks have not won a game yet by that margin?

Here's a comic from XKCD on attempting to forecast elections based on electoral precedent. I've taken the liberty of applying this to the Canucks season thus far...

The Vancouver Canucks are undefeated!

Until they lost Game 1 to the San Jose Sharks.

The Vancouver Canucks haven't won a game!

Until they beat Edmonton 6-2 in Game 2.

The Vancouver Canucks haven't won consecutive games!

Until they beat Calgary in Game 3.

The Vancouver Canucks have not won without scoring the first goal!

Until New Jersey in Game 4.

The Vancouver Canucks have never lost a game after a win!

Until losing to San Jose in Game 5.

No team that isn't the Sharks have beaten the Canucks!

But Montreal did in Game 6!

The Canucks have yet to win a road game in regulation!

But in Game 7, they won in Philadelphia.

…and they've conceded a goal in every single game!

But not against Buffalo.

The Canucks have yet to pick up a point against a playoff opponent!

But they did in Pittsburgh.

But they have yet to be beaten in regulation by an Eastern Conference opponent on the road!

On but in Columbus they did.

The Canucks have yet win when Roberto Luongo allows more than three goals!

But they beat New York 5-4.

Every team that has played the Canucks at least once this season has won the rematch.

But New Jersey didn't.

The only Western teams the Canucks can beat are the ones in Alberta!

That was true, until their overtime win against St. Louis

But can they win in regulation if they take more than three penalties? They haven't been able to do that yet…

Just ask Washington.

The Canucks have won every game they've held their opponents to two or fewer goals…

But Detroit beat them 2-1 on October 30th.

The Canucks have yet to beat an Original Six team. San Jose has already beaten three of them!

Yeah, but Toronto got whipped by the Canucks on November 2nd.

The Canucks haven't won a game without stretching a win streak to at least two games.

Then it may come as a shock that the Phoenix Coyotes beat them in a shootout that night.

Can the Canucks beat the Sharks?

They did, quite convincingly, on November 7.

Here's a fun stat… the Canucks have picked up a point in every game there have been at least six goals scored.

Hope you didn't try to use that statistic after the Kings beat the Canucks 5-1 that night…

No worries. The Canucks have won every game on the second half of a back-to-back.

They lost in Anaheim on November 10…

Okay, but they've won all their home games after coming back from a road trip thus far.

Until they lost on November 14 to the Sharks (again)

They've beaten every team below them in the Western Conference standings (based on where they are today [November 17])

But the 10-7-2 Stars dispatched the 11-7-3 Canucks.

The Canucks can't lose to American goalies born in states that begin with the letter "M"…

…until Tim Thomas front Flint, Michigan managed.

Vancouver are so dependent on the Sedins. They can't win when Henrik takes fewer than 20 shifts.

But they beat Columbus 6-2.

The good news is that the Canucks have picked up a point every time I'm in the building this year.

Alas, my perfect streak ended at two games, as I was in attendance to see Chicago defeat Vancouver 2-1.

The Canucks are 5-0 when Jason Garrison registers a point, provided he scored a point in the previous game as well.

Despite Jason Garrison's efforts, the Canucks lost to Los Angeles.

The Canucks have yet to win a regulation game where they've been out-shot…

But they beat Ottawa 5-2 while being out-shot 39-28.

New York should be worried if the Canucks score twice. They've picked up a point in every game they've scored at least two.

Yeah, before they lost 5-2 to the Rangers.

If the Canucks don't score more goals at even strength, they lose.

But against Carolina, the Canucks and Hurricanes both scored two even strength goals, and the Canucks won…

Anyway.

…the point is that goal posts can be set at any arbitrary point to fit a narrative. This is a process called 'overfitting'. To use the Wikipedia definition, "overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship".

This can also be called the Black Swan Effect. The statement "all swans are white" can be immediately disproven based on one observation of a swan that is not white. As it happens, NHL teams overwhelmingly win games when they score at least three goals, since the possibility of a 2-2 tie no longer exists, all of those games artificially become "3-2" games in the Bettman NHL.

"Home Teams" in the NHL are 154-20-20 when scoring 3 or more, and 34-150-24 when scoring two or fewer. When the difference between your team and the NHL average can be met with just one or two observations, it's not worth bringing up from any sort of analytical perspective.

Sample size means a lot. "X team has earned 67% of the shots in this game!" means a heck of a lot more when a team is out-shooting the other 40-20 than it does at 2-1. It's true that over hundreds of observations, a team that picks up every two shots out of three in a game will probably wind up being the teams that out-shoot, but with one or two observations there's way more signal than noise.

Ultimately, you want any statistic you read off to not be perfect, because as soon as you hit that black swan, the analysis becomes flawed. One of the benefits to attempting to prognosticate team's future records using things like Corsi is precisely because it's not perfect all the time, so nobody working closely with the data is ever under the illusion it's going to work 100% of the time.

A real-world example would be when I was apartment-hunting back in April. The south-facing window only let sunlight about a quarter of the way into the living room in the morning. "That's good," I thought. "I like natural lighting but too much sunlight is uncomfortable." Now, as we approach winter, I'm beginning to realize I hadn't taken into account that the sun runs a lower trajectory across the sky now that our hemisphere has tilted away from the sun. Now on a sunny day, between 9 and 11 in the morning my couch is drenched in sunlight! The one observation I made in April was not sufficient, since I failed to take into account how that would work out in the future.

Working in absolutes can ruin you because then you're then committed to ignoring the statistic once it no longer becomes an absolute. There are many, many, many ways to be fearful of the Sharks, but a team's record in games that end 2-1 is not one of them.

63811cbf517d2d685ea09e103488ea3a
Cam Charron is a BC hockey fan that writes about hockey on many different websites including this one.
Avatar
#1 PB
December 02 2013, 01:53PM
Trash it!
3
trashes
+1
26
props

This is an effective and exhaustive way of drawing out the basic point that Blake Price is an idiot.

Avatar
#2 NM00
December 02 2013, 02:20PM
Trash it!
4
trashes
+1
7
props

@PB

It could have been more concise.

Like a twitter exchange, for example...

Avatar
#3 Graham S.
December 02 2013, 02:26PM
Trash it!
1
trashes
+1
11
props

Appreciate a "sporting" website that can step away from its core focus to make a point like this, and keep it entertaining.

Nicely done.

Avatar
#4 Dustin
December 02 2013, 02:45PM
Trash it!
1
trashes
+1
6
props

Nicely put. In the wise words of Homer Simpson "You can come up with statistics to prove anything, Kent. Forty percent of all people know that."

Avatar
#5 aaron
December 02 2013, 02:59PM
Trash it!
3
trashes
+1
9
props

How'd you get that much natural sunlight into a basement?

Avatar
#6 pleiadian jim
December 02 2013, 04:05PM
Trash it!
1
trashes
+1
0
props

Couldnt you argue though that if Vancouver had a higher save percentage in goal, say a .927 like a certain somebody has in the east, that the odds are the canucks would have at least gotten one or two more points out of those low scoring one goal games?

Avatar
#7 acg5151
December 02 2013, 04:07PM
Trash it!
4
trashes
+1
11
props

Cam, do you keep your windows closed. If so, according to NM00, you have something in common with the Canucks.

Avatar
#8 Lemming
December 02 2013, 04:53PM
Trash it!
1
trashes
+1
0
props

@acg5151

I can only assume the blinds have closed.

Avatar
#9 Nat
December 02 2013, 05:00PM
Trash it!
2
trashes
+1
3
props

This is great - you got the point across, in an amusing way.

Avatar
#10 Senrik Hedin
December 02 2013, 11:15PM
Trash it!
1
trashes
+1
2
props

keep calm and beat the dead horse

Avatar
#11 The man
December 02 2013, 11:56PM
Trash it!
1
trashes
+1
2
props

Cam.

You sir are a genius.

That is all.

Comments are closed for this article.