Saturday, 21 May 2016

Tennis Stats Mailbag - Edition 1

It has been a while since I wrote much about stats in tennis. There are a number of reasons for this, but one being that I was struggling to think of topics to write about. As a result, it seemed like an interesting idea to ask around my Twitter followers for some ideas of what they might like to know. I got a number of responses and have picked a selection of them to answer in this article.
The idea that there is plenty of tanking going on the week before a Grand Slam event has been around for a long time. The theory is that higher ranked players will generally want to prioritise the slam events due to the better chance of picking up prize money and ranking points.

Firstly, if we address the actual question of whether there are more second set bagels, we find that in ATP events not in the week before a slam, 3.3% of matches that finish in straight sets see a second set bagel. In the week before a slam, 3.0% of matches that finish in straight sets see a second set bagel. There is very little different here, and indeed, we also see virtually no difference in 6-1 second sets either or even the percentage of matches that finish in straight sets.

However, one difference that we do see is the number of favourites that lose in straight sets. In an average ATP tournament not in the week before a slam event, we see 45.8% of wins by outsiders being completed in straight sets. However, in the week before a slam, this jumps to 59.1%, suggesting that there may be an element of favourites not putting everything into trying to fight back if they go down by a set and a break.

From a betting perspective, can we see any profit in backing against the higher ranked players in this week before a slam? Well, if you had £10 on every match since the start of 2010 where a non top-20 player was against a top-20 player in the week before a slam, you would have won £189.70 at an impressive return of 10.4%. Indeed, if we break this down further, if you had backed non-top 20 players against top 10 opponents, you would actually have lost £46.20. It seems that either the top 10 players only enter these events if they intend to try and win them or that even if the top 10 players are not giving it everything, they are still good enough to win. Instead, it is the players ranked between 11 and 20 that are the ones to oppose in this week - a strategy that would have returned £235.90 from £10 stakes at a 19.5% return since 2010.

Novak Djokovic and Andy Murray are arguably the two best returners in the game and their fitness and ability gives them an advantage in long rallies. However, the question of which player tends to come out on top in long rallies between them is an interesting one.

Using the data from the Match Charting Project, we find 13 matches between the pair with information on rally lengths. The data can be summarised as below:
We can see that the longer rallies of 10-20 shots, there is virtually nothing between the two players. Of the 452 rallies of between 10 and 20 shots in the 13 matches, we find that both players have won precisely 226 points each. Djokovic appears to have a slight edge on the seriously long rallies, but a sample of only 109 points means that it could be slightly less reliable.

Instead, it is when Djokovic is able to keep the points shorter that he has the advantage over Murray. Of points between 4 and 9 shots in length, Djokovic wins 54.7%, which is a significant difference and is where his success over Murray has come from - Djokovic is able to attack early in the point and end them early.

If we look at the ATP, we find that the percentage of players that are serving at the first change of ends in a tiebreak goes on to win the tiebreak on 47.5% of occasions. This could possibly suggest that there is a very slight impact from this.

However, if we look at the percentage of points that are won by the player serving at this stage in the point before the change of ends and the point after the change of ends, we see very little difference. The server wins 64.2% of the points immediately before the first change of ends compared to 64.6% of the points immediately after the change of ends.

Indeed, we actually see that the percentage of aces increases from 8.0% to 9.6% following the first change of ends and the percentage of double faults drops from 3.3% to 2.7%, suggesting that the slightly longer break may actually benefit the server, giving him the opportunity to focus his mind on his upcoming serve.

The top 5 players with the biggest increases in return points won in the top 50 in the world rankings are Guido Pella, Ricardas Berankis, Milos Raonic, Richard Gasquet and Nick Kyrigos.

This is an interesting list. Guido Pella has won 6.9% more return points at ATP level in 2016 as he did in 2015, although he only played a handful of matches last year and mostly on his less favoured surface. On clay, he dominated at Challenger level in 2015, racking up a 45-14 record with four titles and he has really carried this form into 2016 reaching the final in Rio de Janeiro, the quarters in Nice and Bucharest and a third round run at Indian Wells.

As to whether he can keep up these improved number, it is debatable. He has been very efficient at beating players ranked below him this year with a 9-3 record and 40.8% of points won on return. However, against players ranked above him, he is just 3-8 and 34.8% won on return. So, it seems as though he has a level and while he may be able to keep up his improved return stats, there is not much to suggest that he can improve them significantly more.

It is tough to feel that the numbers for Ricardas Berankis have not benefited from the level of opposition that he has faced thus far in 2016. The average ranking of his opponents last year was 66.4 compared to 90.7 this year and in matches where he has won a significant number of return points, he has been aided by very poor first serve percentages from his opponents, such as Seppi's 49.5% 1st serves in Doha and 55.6% by Fritz in Memphis.

The improvements on return for Milos Raonic and Nick Kyrgios are impressive though. Two big servers that are still developing their return game, they have both posted significant improvements in their return numbers despite facing a higher average level of opposition in 2016. They are already two of the most effective servers in the game and if they can add an improved return game, they could be very dangerous.

However, they are still starting from a low level. Even with a 4.4% improvement, Milos Raonic still only has the 41st highest return points won percentage in the top 50, while Nick Kyrgios' 3.2% improvement moves him to 34th in the top 50. They both still have plenty of time to improve this aspect of their game and I see no reason why this improvement cannot be sustained.

Saturday, 14 May 2016

How do run-rates change across a T20 innings?

In the 13 years since its inception, T20 cricket has arguably become the most popular format of cricket across the world. Love it or hate it though, it has undoubtedly had a huge impact on how batsmen play across all formats.

Over the past five months, I have been building a database of ball-by-ball data for T20 events around the world. It currently consists of information for just under 290,000 deliveries covering all T20 matches since the start of 2014 as well as a number of matches from international level and the IPL and CPL from before 2014.

Across all of the matches, the average score is 144.9 runs in the 20 overs, suggesting an average run-rate of 7.25 through the innings. However, as anyone that has watched T20 cricket will know, the run-rate is not constant throughout the innings, so I thought it would be interesting to look at how it changes as the innings progresses.


We can see that the first over of a T20 is generally the lowest scoring of all the overs as the opening batsmen get used to the surface and do not want to risk giving away an early wicket. Indeed, it is the only over in a T20 match that averages a run-rate of less than six runs per over. This reticence from batsmen to attack early on may explain why many teams look to get an over from one of their weaker bowlers out the way here.

Once the first two overs are out of the way, the run-rate for Overs 3-6 in the powerplay are pretty constant, ranging from 7.55 in Over 3 to 7.75 in Over 6. During this powerplay period, only two fielders are allowed outside the circle, which explains why batsmen are more aggressive and able to score more runs in this period.

There is a significant drop-off once the powerplay ends with the runs-per-over dropping from 7.75 in Over 6 to 6.40 in Over 7. Indeed, the run-rate does not return to the powerplay levels until it reaches 8.05 in the 15th over of the innings. We can also see the acceleration as teams attack at the end of the innings as the graph continues upwards.

The drop-off in the last two overs in the second innings can probably be explained by teams either being in a comfortable position to win and not needing to go mad or being well behind and no real incentive to really attack.

The next question is whether the pattern is similar across different tournaments or whether teams structure their innings different in different competitions. The graph below shows the same, but broken down by tournament instead of innings.


There are some interesting features to note here. Firstly, we can see that there is a significantly higher run-rate in the powerplay overs in the T20 Blast in England than in either the IPL or the CPL. The T20 Blast averages 47.0 runs at 7.84 runs-per-over during the six powerplay overs, compared to 42.0 in the IPL and 39.3 in the CPL. Whether this is a feature of weaker bowling in the T20 Blast or whether it is a strategy to attack early is unclear, but it is something worth considering.

We can also see that batsmen accelerate more toward the end in the IPL, particularly in the final five overs of the innings. Throughout most of the innings, we see the T20 Blast ahead of the IPL, but once we hit Overs 16-20, the IPL starts to move clear. Could the higher run-rate earlier in the innings in the T20 Blast result in fewer wickets and established batsmen left later on? That is something to look at in the future potentially.

The CPL is an interesting tournament. It lags way behind the IPL and T20 Blast in terms of run-rate throughout most of the innings before they start properly hitting in the final over or two. Indeed, the 10.87 in the final over in the CPL is far ahead of any other over in either of the other competitions, but it is still only just enough to lift the overall innings average just over the 7.0 runs-per-over mark. Does this mean that the grounds are tougher to score on in the Caribbean? Maybe, although they are apparently easy enough to score off at the end of the innings. Does it mean that West Indian teams need to work on rotating the strike more often in the middle overs? Possibly.

This is only a brief first look at how innings are structured in T20 matches. There are obviously plenty of other factors to build into this - how many wickets do teams have remaining at different stages or are the grounds tougher to score on in certain areas are two, but there are many more. However, it gives an overview of how the run-rate changes across an innings and it may give some ideas of periods in which teams can look to improve their strategies.
Powered by Blogger.