Sunday, 16 August 2015

League 2 Stats: Can We Trust the Data?

Eight goals at Brunton Park as Cambridge drew 4-4 with Carlisle, while two of the pre-season title promotion fancies in Portsmouth and Leyton Orient continued their 100% start to the season with comfortable wins. Oxford United were unable to back up a 4-0 cup win against Brentford in the league, while Yeovil and Dagenham & Redbridge continue to struggle. The second of my weekly reviews looks at a couple of the matches and teams from this week. It will also look more in depth at the Carlisle against Cambridge goals and see to what extent we can trust the data that drives the model and the subsequent analysis. Any model is only as good as the inputs, so how good are the inputs in League 2?

Crawley Town 1 - 2 AFC Wimbledon

Bookmaker Odds: Crawley Town 2.97, Draw 3.35, AFC Wimbledon 2.56

AFC Wimbledon were mentioned last week in terms of being unfortunate in not turning a good ExpG advantage into three points. This week, they again had an impressive ExpG scoreline of 2.1-0.7 in their favour and goals from Adebayo Akinfenwa and Andy Barcham ensured that they took full advantage this week.
Andy Barcham has started life well for AFC Wimbledon

Six of their 11 chances came from inside the danger zone, including a headed chance from 'Very Close Range' coming with a high 0.44 ExpG value, which Andy Barcham duly converted. In contrast, they restricted Crawley to just three danger zone opportunities and 10 shots in total, half of which came from outside the penalty area.

AFC Wimbledon were expected to finish mid-table this season, although early signs suggest they could push further up the table. Andy Barcham has been a good signing thus far and his 1.3 ExpG in the first two matches is higher than any other player in the division. For Crawley, they are simply hoping to survive this season and they will be desperate to create more in attack with their 1.4 ExpG through the first two matches being the lowest of all 24 teams in League 2.

Plymouth Argyle 1 - 2 Portsmouth

Bookmaker Odds: Plymouth 2.67, Draw 3.30, Portsmouth 2.88

For the third consecutive season, Portsmouth were inserted as the pre-season title favourites with the bookmakers. In the past two seasons, it has felt as though they were being priced purely on past reputation, rather than actual ability, which was backed up by their finishing in the bottom half in both of those season. However, there are signs that the bookmakers might have it correct this time around.

After victory over Dagenham and Redbridge last week, they faced a potentially tricky trip to Plymouth, who reached the playoffs last season. Overall, they had the ExpG advantage by 2.5-1.2, although they were helped by the two penalties that they were awarded. The slight worry might be a struggle to create good chances from open play, creating just two chances of their 10 opportunities inside the danger zone. Having said that, they created five danger zone chances last week and it will take time for their multiple new signings to gel, particularly in an attacking sense.

However, the concerns continue to mount for Plymouth. After they took a pounding on the ExpG last week against AFC Wimbledon, despite winning that match, this week did not improve matters and their ExpG for the first two matches stands at 1.8-5.7.

Dagenham and Redbridge 1 - 3 Leyton Orient

Bookmaker Odds: Dagenham and Redbridge 4.51, Draw 3.68, Leyton Orient 1.88

Before the season started, I wrote about how Dagenham and Redbridge had defied the stats in the past two seasons to secure mid-table finishes, despite horrific shot statistics. Now, they have faced two very decent sides in Portsmouth and Leyton Orient so far, but it is troubling times for the club.

The ExpG scoreline for this match fairly accurately represented the final scoreline as Leyton Orient dominated 2.9-1.2. Orient created no fewer than six chances inside the danger zone as well as three further chances inside the penalty area and fully deserved their three goals, even if one did come from the penalty spot. Whilst Dagenham and Redbridge also created five danger zone chances, the majority were headed chances with a lower ExpG value.

For Leyton Orient, it has been an excellent start to life in League 2 and they fully deserve their six points, while Dagenham and Redbridge will be hoping to get their season going against slightly easier opposition in Exeter next time out.x

Morecambe 1 - 0 Accrington Stanley

Bookmaker Odds: Morecambe 2.20, Draw 3.54, Accrington Stanley 3.47

Accrington Stanley have just one point from their two matches so far, but they might feel slightly aggrieved by this return. With a 1.8-0.6 ExpG scoreline from this match following their 1.6-0.6 advantage against Luton in the first game, they have been creating chances and restricting the opportunities for their opponents, but without converting this advantage into points. The only criticism might be that they are taking more shots from outside the area than any other side so far, which may be partially padding their ExpG with a number of low value efforts.

They have restricted their opponents to just three danger zone chances in their two matches thus far, two in this match against Morecambe. Unfortunately for them, Morecambe converted one of those chances through Aaron Wildig.

Accuracy of Data

Obviously, in the Premier League and other top leagues around the world, there is now plenty of data available and with the ability to watch full matches, it is very simple to ensure that the data that is being used is accurate. However, in League 2, aside from the odd Sky Sports televised match, it is virtually impossible to get full matches or even extended highlights to check the accuracy of the data.

The Carlisle v Cambridge United match today was interesting in this regard. With a final score of 4 - 4 compared to an ExpG scoreline of 1.4 - 1.7, there seemed to be a huge discrepancy between the two. This could be accurate, this could be a sign of problems in the model or it could be an issue with the data. Having watched the highlights on the Football League Show, I decided to look at comparing some of the goals with the descriptions from the data.

As an example, the second goal for Cambridge was described in this way in the data - 'Goal! Carlisle United 0, Cambridge United 2. George Taft (Cambridge United) left footed shot from the centre of the box to the centre of the goal'. There is no mention of any chances in the lead-up to the goal.
The second goal came after a goalmouth scramble from a free-kick. The first chance in this series of events came from Barry Corr, circled in red above. A chance from about three yards out would have a reasonably high expectation of a goal, but was denied by an excellent save by the Carlisle keeper. However, there was no mention of this attempt in the data.
Moments after the Barry Corr chance, the ball rebounded back into the centre of the penalty area, where Cambridge captain, Mark Roberts, had a shot from six yards out. This beat the keeper, but was cleared off the line by Charlie Wyke, who we can see on the far left of the picture getting back to cover. Again, this should be classed as a good chance, but is not mentioned in the data.
That was still not the end of the chances in this series of play. The keeper made another save to deny a Cambridge player before George Taft finally score via a deflection. There were four shots in this sequence of play, but in the data, this was simply recorded as one shot from 'Centre of Box' and thus it may not accurately represent the ExpG from this sequence. Obviously, there is a danger in counting rebounds and multiple shots in the same sequence, but in only counting one, it almost certainly underestimated the chance of Cambridge scoring here.

While that may have been slightly inaccurate, the first goal for Carlisle is rather concerning in terms of the accuracy of the data. That goal was described as follows - 'Goal! Carlisle United 1, Cambridge United 2. Jabo Ibehre (Carlisle United) right footed shot from outside the box to the bottom right corner. Assisted by Jason Kennedy.'

From that description, it would seem that Jabo Ibehre has drilled one in from distance. In other words, he has scored a goal that would generate a fairly low ExpG in the model. However, if we look at the highlights, we see differently.
We can see here that Jabo Ibehre is certainly not outside the penalty area. In fact, he is scoring from virtually on the penalty spot with the keeper well out of position. You would consider that this is a chance that should have a pretty high ExpG value. However, based on the data, my model will class this as a chance with an ExpG value of just 0.036. Unfortunately, this is just plain wrong and seriously undermines my trust in the data that is available.

Jabo Ibehre's second goal of the match was classed as a header from Very Close Range. Again, the screenshot of the goal is shown below.
It is certainly a chance from a good position, but given the other shooting area locations, classing this as Very Close Range might not be entirely accurate. I had assumed that 'Very Close Range' probably had to be inside the six-yard box and pretty much between the width of the goalposts. It would seem that more information on precisely how shot locations are classed in the data might be required.

The final equaliser from Carlisle also clinched Ibehre's hattrick. Technically in this case, the description of 'Outside Box' is just about correct.
However, it is right on the very edge of the penalty area and in the model, this would be lumped in with all other shots from outside the penalty area. This is one particular area in which I think that the model is certainly lacking as clearly this chance has a far better chance of going in than a speculative effort from 40-yards, although they would be given the same value in the model.

The reality is that it was always going to be unlikely that an ExpG model would accurately predict this match. They tend to struggle with extremes and generally in matches like this, the actual goals will be far ahead of the expected goals. However, we have seen in this match some real limitations in the data, especially in the first Carlisle goal, where the shooting position simply seems to have been incorrectly classified. Had the Cambridge shots in the lead-up to the opening goal been counted, had Ibehre's first goal been correctly classified, etc. might we have seen an ExpG scoreline that was closer to the final scoreline? These were only the goals in the match as well - there were 25 shots in total in the match, so might some of these have been incorrectly classified?

Unfortunately, until we have either more robust data or greater access to full highlights or full matches in League 2, we simply have to make do with the limited data that we have access to and we have to put a certain level of trust in it, even if we must put question marks around our models based upon that flawed data.

FULL RESULTS

*ExpG results in brackets

Notts County 0 - 2 Mansfield Town (1.0 - 1.0)
Barnet 0 - 2 Wycombe Wanderers (1.7 - 2.2)
Carlisle 4 - 4 Cambridge United (1.4 - 1.7)
Crawley Town 1 - 2 AFC Wimbledon (0.7 - 2.1)
Dagenham and Redbridge 1 - 3 Leyton Orient (1.2 - 2.9)
Luton Town 2 - 2 Oxford United (1.3 - 1.3)
Morecambe 1 - 0 Accrington Stanley (0.6 - 1.8)
Newport County 2 - 2 Stevenage (1.6 - 1.5)
Northampton Town 3 - 0 Exeter (1.7 - 0.5)
Plymouth Argyle 1 - 2 Portsmouth (1.2 - 2.5)
Yeovil 0 - 1 Bristol Rovers (0.3 - 1.7)
York City 1 -2 Hartlepool United (1.3 - 1.4)

No comments:

Post a Comment

Powered by Blogger.