Sunday, 16 August 2015

League 2 Stats: Can We Trust the Data?

Eight goals at Brunton Park as Cambridge drew 4-4 with Carlisle, while two of the pre-season title promotion fancies in Portsmouth and Leyton Orient continued their 100% start to the season with comfortable wins. Oxford United were unable to back up a 4-0 cup win against Brentford in the league, while Yeovil and Dagenham & Redbridge continue to struggle. The second of my weekly reviews looks at a couple of the matches and teams from this week. It will also look more in depth at the Carlisle against Cambridge goals and see to what extent we can trust the data that drives the model and the subsequent analysis. Any model is only as good as the inputs, so how good are the inputs in League 2?

Crawley Town 1 - 2 AFC Wimbledon

Bookmaker Odds: Crawley Town 2.97, Draw 3.35, AFC Wimbledon 2.56

AFC Wimbledon were mentioned last week in terms of being unfortunate in not turning a good ExpG advantage into three points. This week, they again had an impressive ExpG scoreline of 2.1-0.7 in their favour and goals from Adebayo Akinfenwa and Andy Barcham ensured that they took full advantage this week.
Andy Barcham has started life well for AFC Wimbledon

Six of their 11 chances came from inside the danger zone, including a headed chance from 'Very Close Range' coming with a high 0.44 ExpG value, which Andy Barcham duly converted. In contrast, they restricted Crawley to just three danger zone opportunities and 10 shots in total, half of which came from outside the penalty area.

AFC Wimbledon were expected to finish mid-table this season, although early signs suggest they could push further up the table. Andy Barcham has been a good signing thus far and his 1.3 ExpG in the first two matches is higher than any other player in the division. For Crawley, they are simply hoping to survive this season and they will be desperate to create more in attack with their 1.4 ExpG through the first two matches being the lowest of all 24 teams in League 2.

Plymouth Argyle 1 - 2 Portsmouth

Bookmaker Odds: Plymouth 2.67, Draw 3.30, Portsmouth 2.88

For the third consecutive season, Portsmouth were inserted as the pre-season title favourites with the bookmakers. In the past two seasons, it has felt as though they were being priced purely on past reputation, rather than actual ability, which was backed up by their finishing in the bottom half in both of those season. However, there are signs that the bookmakers might have it correct this time around.

After victory over Dagenham and Redbridge last week, they faced a potentially tricky trip to Plymouth, who reached the playoffs last season. Overall, they had the ExpG advantage by 2.5-1.2, although they were helped by the two penalties that they were awarded. The slight worry might be a struggle to create good chances from open play, creating just two chances of their 10 opportunities inside the danger zone. Having said that, they created five danger zone chances last week and it will take time for their multiple new signings to gel, particularly in an attacking sense.

However, the concerns continue to mount for Plymouth. After they took a pounding on the ExpG last week against AFC Wimbledon, despite winning that match, this week did not improve matters and their ExpG for the first two matches stands at 1.8-5.7.

Dagenham and Redbridge 1 - 3 Leyton Orient

Bookmaker Odds: Dagenham and Redbridge 4.51, Draw 3.68, Leyton Orient 1.88

Before the season started, I wrote about how Dagenham and Redbridge had defied the stats in the past two seasons to secure mid-table finishes, despite horrific shot statistics. Now, they have faced two very decent sides in Portsmouth and Leyton Orient so far, but it is troubling times for the club.

The ExpG scoreline for this match fairly accurately represented the final scoreline as Leyton Orient dominated 2.9-1.2. Orient created no fewer than six chances inside the danger zone as well as three further chances inside the penalty area and fully deserved their three goals, even if one did come from the penalty spot. Whilst Dagenham and Redbridge also created five danger zone chances, the majority were headed chances with a lower ExpG value.

For Leyton Orient, it has been an excellent start to life in League 2 and they fully deserve their six points, while Dagenham and Redbridge will be hoping to get their season going against slightly easier opposition in Exeter next time out.x

Morecambe 1 - 0 Accrington Stanley

Bookmaker Odds: Morecambe 2.20, Draw 3.54, Accrington Stanley 3.47

Accrington Stanley have just one point from their two matches so far, but they might feel slightly aggrieved by this return. With a 1.8-0.6 ExpG scoreline from this match following their 1.6-0.6 advantage against Luton in the first game, they have been creating chances and restricting the opportunities for their opponents, but without converting this advantage into points. The only criticism might be that they are taking more shots from outside the area than any other side so far, which may be partially padding their ExpG with a number of low value efforts.

They have restricted their opponents to just three danger zone chances in their two matches thus far, two in this match against Morecambe. Unfortunately for them, Morecambe converted one of those chances through Aaron Wildig.

Accuracy of Data

Obviously, in the Premier League and other top leagues around the world, there is now plenty of data available and with the ability to watch full matches, it is very simple to ensure that the data that is being used is accurate. However, in League 2, aside from the odd Sky Sports televised match, it is virtually impossible to get full matches or even extended highlights to check the accuracy of the data.

The Carlisle v Cambridge United match today was interesting in this regard. With a final score of 4 - 4 compared to an ExpG scoreline of 1.4 - 1.7, there seemed to be a huge discrepancy between the two. This could be accurate, this could be a sign of problems in the model or it could be an issue with the data. Having watched the highlights on the Football League Show, I decided to look at comparing some of the goals with the descriptions from the data.

As an example, the second goal for Cambridge was described in this way in the data - 'Goal! Carlisle United 0, Cambridge United 2. George Taft (Cambridge United) left footed shot from the centre of the box to the centre of the goal'. There is no mention of any chances in the lead-up to the goal.
The second goal came after a goalmouth scramble from a free-kick. The first chance in this series of events came from Barry Corr, circled in red above. A chance from about three yards out would have a reasonably high expectation of a goal, but was denied by an excellent save by the Carlisle keeper. However, there was no mention of this attempt in the data.
Moments after the Barry Corr chance, the ball rebounded back into the centre of the penalty area, where Cambridge captain, Mark Roberts, had a shot from six yards out. This beat the keeper, but was cleared off the line by Charlie Wyke, who we can see on the far left of the picture getting back to cover. Again, this should be classed as a good chance, but is not mentioned in the data.
That was still not the end of the chances in this series of play. The keeper made another save to deny a Cambridge player before George Taft finally score via a deflection. There were four shots in this sequence of play, but in the data, this was simply recorded as one shot from 'Centre of Box' and thus it may not accurately represent the ExpG from this sequence. Obviously, there is a danger in counting rebounds and multiple shots in the same sequence, but in only counting one, it almost certainly underestimated the chance of Cambridge scoring here.

While that may have been slightly inaccurate, the first goal for Carlisle is rather concerning in terms of the accuracy of the data. That goal was described as follows - 'Goal! Carlisle United 1, Cambridge United 2. Jabo Ibehre (Carlisle United) right footed shot from outside the box to the bottom right corner. Assisted by Jason Kennedy.'

From that description, it would seem that Jabo Ibehre has drilled one in from distance. In other words, he has scored a goal that would generate a fairly low ExpG in the model. However, if we look at the highlights, we see differently.
We can see here that Jabo Ibehre is certainly not outside the penalty area. In fact, he is scoring from virtually on the penalty spot with the keeper well out of position. You would consider that this is a chance that should have a pretty high ExpG value. However, based on the data, my model will class this as a chance with an ExpG value of just 0.036. Unfortunately, this is just plain wrong and seriously undermines my trust in the data that is available.

Jabo Ibehre's second goal of the match was classed as a header from Very Close Range. Again, the screenshot of the goal is shown below.
It is certainly a chance from a good position, but given the other shooting area locations, classing this as Very Close Range might not be entirely accurate. I had assumed that 'Very Close Range' probably had to be inside the six-yard box and pretty much between the width of the goalposts. It would seem that more information on precisely how shot locations are classed in the data might be required.

The final equaliser from Carlisle also clinched Ibehre's hattrick. Technically in this case, the description of 'Outside Box' is just about correct.
However, it is right on the very edge of the penalty area and in the model, this would be lumped in with all other shots from outside the penalty area. This is one particular area in which I think that the model is certainly lacking as clearly this chance has a far better chance of going in than a speculative effort from 40-yards, although they would be given the same value in the model.

The reality is that it was always going to be unlikely that an ExpG model would accurately predict this match. They tend to struggle with extremes and generally in matches like this, the actual goals will be far ahead of the expected goals. However, we have seen in this match some real limitations in the data, especially in the first Carlisle goal, where the shooting position simply seems to have been incorrectly classified. Had the Cambridge shots in the lead-up to the opening goal been counted, had Ibehre's first goal been correctly classified, etc. might we have seen an ExpG scoreline that was closer to the final scoreline? These were only the goals in the match as well - there were 25 shots in total in the match, so might some of these have been incorrectly classified?

Unfortunately, until we have either more robust data or greater access to full highlights or full matches in League 2, we simply have to make do with the limited data that we have access to and we have to put a certain level of trust in it, even if we must put question marks around our models based upon that flawed data.

FULL RESULTS

*ExpG results in brackets

Notts County 0 - 2 Mansfield Town (1.0 - 1.0)
Barnet 0 - 2 Wycombe Wanderers (1.7 - 2.2)
Carlisle 4 - 4 Cambridge United (1.4 - 1.7)
Crawley Town 1 - 2 AFC Wimbledon (0.7 - 2.1)
Dagenham and Redbridge 1 - 3 Leyton Orient (1.2 - 2.9)
Luton Town 2 - 2 Oxford United (1.3 - 1.3)
Morecambe 1 - 0 Accrington Stanley (0.6 - 1.8)
Newport County 2 - 2 Stevenage (1.6 - 1.5)
Northampton Town 3 - 0 Exeter (1.7 - 0.5)
Plymouth Argyle 1 - 2 Portsmouth (1.2 - 2.5)
Yeovil 0 - 1 Bristol Rovers (0.3 - 1.7)
York City 1 -2 Hartlepool United (1.3 - 1.4)

Monday, 10 August 2015

League 2 Stats: Better Lucky Than Good?

There were mixed fortunes for the potential promotion candidates on the opening day of the season. Impressive victories for Portsmouth, Cambridge and Leyton Orient would have pleased their fans, while Luton and, in particular, Oxford fans will have viewed the opening day as points dropped. Obviously, it is incredibly early stages and we cannot draw any real conclusions yet, but we shall just take a quick look at some of the performances on the opening day of the season.

Cambridge United 3 - 0 Newport County

Bookmaker Odds: Cambridge 1.76, Draw 3.79, Newport County 5.15

As mentioned last time, Cambridge are a team that have spent plenty of money in the summer and will have been pleased to get off to a good start, albeit against a Newport County side that are one of the favourites for relegation following major budget cuts over the summer. While Cambridge were outshot 10-13, their six chances from the danger zone was beaten by just two other teams in the division on the opening day.

Barry Corr showed why Cambridge were so keen to bring him in as he scored twice, once from a header from a corner and the other from a one-on-one after some poor Newport County defending. My model classes these two chances as combining for 0.4 ExpG, although due to lack of data, it is unable to class one-on-one chances any differently from a normal chance from that area, so the reality is that it should have been worth more.
Two goals from Barry Corr got Cambridge United off to a good start in League 2

While Newport County had 13 shots on goal, Cambridge were able to restrict them to just two shots from the danger zone, with seven of those 13 shots coming from outside the area. They mounted up to give an ExpG scoreline for the match of 1.2-1.0, but as mentioned earlier, the nature of Corr's second goal means this could well have slightly underestimated Cambridge's dominance in the match.

A solid start for Cambridge, although there will undoubtedly be tougher challenges still to come.

Stevenage 0 - 2 Notts County

Bookmaker Odds: Stevenage 2.42, Draw 3.36, Notts County 3.18

Newly relegated Notts County were not necessarily expected to push for automatic promotion, but will be optimistic of challenging for a playoff spot and they will have been encouraged by their performance on the opening day of the season. They outshot Teddy Sheringham's side 11-8 and will have been particularly pleased to have had two chances from very close range, converting both of them. In terms of ExpG, they had the advantage 1.8-1.4, so while the match might have been slightly closer than the scoreline suggested, they were certainly good value for the win.

One minor concern may be the three chances conceded by Notts County in the danger zone and the three further chances from inside the penalty area, compared with the four danger zone chances and one further in the penalty area that they created, but it is an encouraging start for their Dutch manager, Roberto Moniz.

AFC Wimbledon 0 - 2 Plymouth Argyle

Bookmaker Odds: AFC Wimbledon 2.58, Draw 3.28, Plymouth Argyle 3.01

Plymouth got off to a good start with a victory at AFC Wimbledon, but the scoreline does certainly not give an accurate representation of the match. AFC Wimbledon were able to outshoot Plymouth 14-11 and in terms of the quality of chance created, the home side had a clear advantage. In terms of ExpG, AFC Wimbledon had the advantage 1.7-0.6, so will count themselves very unlucky to have lost this match.

Plymouth created zero chances from the danger zone and of their two goals, one came from outside the penalty area, while one was volleyed in from a difficult angle. It was not a promising start in attack and this was tempered by the seven chances that AFC Wimbledon created in the danger zone, three from headers and four with the feet. Andy Barcham had two good chances from the centre of the box, while Adebayo Akinfenwa had several good headed chances as he continues to show that he is a real handful at this level.

Plymouth will certainly be looking to create more in the attacking areas as the season continues, while AFC Wimbledon will need to look to finish off their chances more efficiently if they are to improve on last season's mid-table finish.

Shooting Areas

In terms of calculating shot quality, each shot taken in the match is recorded with four pieces of information. The first two are obvious and they are the player that took the shot and the outcome of the shot (goal, saved, missed or blocked). The third is the shot type - whether the shot was with the left foot, the right foot or the header. Clearly, a shot from the same location has a different probability of resulting in a goal if it is a shot with the foot, rather than a header, so it is important to differentiate between the two. The final piece of information is the shot location. This is broken down into six different categories. These are 'Very Close Range', 'Centre of Box', 'Left/Right Side of 6-Yard Box', 'Left/Right Side of Box', 'Difficult Angle' and 'Outside Box'.

With these four pieces of information, we can combine this season's data with that from last season to see the probability of any type of shot resulting in a goal. For example, we can see that a headed chance from the centre of the box is scored 10.3% of the time or it is worth 0.103 of a goal. Calculating these values across every possible shot type, we can create a basic expected goals model.

Clearly, this is very basic, but it has to be so due to the lack of further information at this level. It treats a shot from just outside the area as identical to a shot from 45 yards and a one-on-one chance from the centre of the box as identical to a shot squeezed away under intense pressure from the same position, but without video highlights of every shot taken, it is impossible to really go into further depth. It is also debatable as to how relevant ExpG can be for individual matches as opposed to aggregated up over the season, but it is one of many tools that we can look at to get an overview of a match.

Hopefully though, this will help to give us an idea of which teams are overperforming or underperforming their underlying stats. Looking at last season, the teams that had the three highest ExpG difference across the whole season finished inside the top 3 and gained automatic promotion, while Southend, who were promoted through the playoffs, were fourth by this measure, so it clearly represents the quality of teams relatively well.

Thursday, 6 August 2015

League 2 Stats: An Introduction and the Season Ahead

League 2 gets underway on Saturday and I decided it would be an interesting idea to write a regular set of articles as the season progresses looking deeper into some of the stats from the league. There are plenty of people out that that write regularly about the Premier League and the Championship, as well as numerous other leagues around Europe, but I have not yet come across a regular stats-based article for League 2. Hopefully this might prove to be of some interest to people as the season progresses.
Which team will succeed Burton Albion as champions of League 2?

One of the difficulties of using stats to make predictions across seasons in League 2, as is the case in many other lower divisions around the footballing world, is the significant turnover of players that we see. While Premier League clubs may be classed as busy in the transfer window if they bring in three or four new faces, League 2 clubs often see wholesale changes with key personal moving up the divisions or to rivals, whilst fresh new faces from the academy or released from clubs higher up the pyramid make their first real foray into professional football. As a result, I am not going to make any precise predictions at this stage, but I will briefly talk about a couple of teams and storylines to watch as the season progresses.

1. Cambridge United

Expectations are high at the Abbey Stadium this coming season as the club has invested heavily in a number of talented new acquisitions. Having used their first season since promotion to set the foundations, they have shown great ambition, particularly funded by the pair of FA Cup ties against Manchester United last season, in bringing in the likes of Barry Corr and their former player, Luke Berry, to drive the club's promotion charge.

Last season, they were very much lower mid-table from a statistical viewpoint. They had a TSR (total shots ratio) of 46.9, which ranked 19/24 in League 2, while their SoTR (shots on target ratio) was slightly better at 48.4, which ranked at 14/24. Their 61 goals scored was the sixth highest, but they were let down defensively as their 66 goals conceded was the sixth lowest in the division.
Barry Corr experienced promotion with Southend last season and will be hoping to do the same at Cambridge

This poor defensive record was borne out in the fact that only five teams conceded fewer shots per game than Cambridge's 10.96 and their ExpG conceded was 54.4, which ranked 17/24 in the league. Arguably though, they may have overachieved going forward with their 47.4 ExpG scored ranking just 14/24 in the league. This would appear to be partially down to a 75.0% conversion rate from close range headers, compared to an overall league average conversion rate of 43.0%.

The key players to look out for are likely to be Barry Corr, whose 11 non-penalty goals helped to fire Southend into League 1, while Mark Roberts and Elliott Omozusi have both dropped down from League 1 to help shore up the defense.

The future looks promising for Cambridge, although plenty will depend on how the team gels. They have a relatively easy start to the season, but if they struggle to click early on, it could be a rough ride for one of the fancied teams in the division.

2. Dagenham and Redbridge

Generally there is a pretty reasonable correlation between TSR/SoTR and total points, but in the two seasons since Wayne Burnett took over at Dagenham and Redbridge, they have been a clear anomaly. In his first season, he secured an excellent finishing position of 9th with 60 points, despite having the lowest SoTR in the division and the second lowest TSR. A PDO of 105.2, the fifth highest in the division, seemingly helped.

Just when you might expect the team to regress back, they recorded a comfortable finish of 14th with 59 points, just one point lower than the previous season. This was despite seeing their TSR and SoTR fall even further to 0.419 and 0.378 respectively, both of which ranked 24/24 in the league. This time, they were aided by a huge PDO of 115.1, the second highest in the past 12 years in League 2. Their average shots against of 12.07 was 23/24 in the division, their ExpG conceded of 60.7 was 24/24, they conceded more shots from the most dangerous areas than any other team in the division, their ExG scored of 42.6 was 20/24. Really, there appears to be little evidence for how they achieved the points total that they did.
Jamie Cureton will be key to Dagenham & Redbridge defying the relegation odds once more

The only hint might be that they were very effective at creating chances from 'Very Close Range', ranking 5/24 in that category and were very efficient at converting those chances. Jamie Cureton's five chances from very close range were only beaten by three other players in the division - Matt Tubbs (6), Adebayo Akinfenwa (8) and Jake Hyde (9).

It would appear that continuing to create these very good chances for Jamie Cureton will be key for Dagenham and Redbridge to continue to defy the statistics and maintain their place in the division. If we see a decrease in the chance quality for the Daggers, their fans will be nervously looking over their shoulder at the drop zone.

3. Oxford United

If there is one thing that Michael Appleton will be hoping for this season, it is an improvement in performances at the Kassam Stadium.

Last season, Oxford's stats looked pretty solid. Their TSR and SoTR of 53.9 and 55.5 both ranked 6/24, but a PDO of just 93.8 was 23/24 in the division and was a big problem for them. The concern would be that it was not driven by issues in just one of the components, but in both, with their scoring % ranking 17/24 and their save % at 20/24. One major area of concern would have been the 53.0% save percentage at the Kassam Stadium, which ranked 24/24, and the fact that the next lowest was 59.3% shows how poor this was. Conversely, their save % away from the Kassam was 78.8%, which ranked 3/24 for away sides, which could either be down to different styles of play at home compared to away or simply bad luck.
Can Oxford United improve their record at the Kassam Stadium?

Looking at the shot locations, we can see that they conceded 16 goals from 23 shots on target in the central area of the penalty box at the Kassam Stadium and given that they allowed just 51 shots on target at home all season, the fact that 29 of those came from the danger zone will be a concern. Having said that, the average conversion rate for chances from the central area in the division was just 16.1%, so whether this was down to very bad luck or poor defending and keeping is tough to tell from the stats. However, having spoken to an Oxford fan, he informed me that they tended to play a very high line early on in the season and that there were a number of mistakes from defenders that meant that plenty of the chances conceded were one-on-one with the keeper, which hints at the poor defending approach.

Away from home, just 44.6% of the chances that they conceded were from the danger zones and they conceded just 8/25 in the central area, so still above the league average, but far fewer than at the Kassam Stadium. Given the majority of teams tended to concede higher quality chances away from home, this is a striking difference and might hint at a different style of play adopted away from the Kassam.

The arrival of George Baldock appeared to mark a turning point defensively for Oxford last season, so they will be delighted to have brought him back in on loan again. Sam Slocombe is an upgrade in goal, while the return of Kemar Roofe (0.47 NPG/90) and Ryan Taylor (0.3 NPG/90) should boost their options going forward. They will also be looking to involve 19-year old academy graduate, James Roberts, more this season after he recorded an impressive 0.48 ExpG/90 from 616 minutes last season.
Powered by Blogger.