Tuesday, 14 October 2014

Understanding the Tennis Radars

Recently, you may have noticed some of the player radars that I have been posting on Twitter. They do seem to have garnered some interest, but there have been a number of questions about them, so I have put together a brief guide as to how they work, what we can learn from them and, just as importantly, their limitations.

The first thing to mention is that they do not really convey any new information. Rather, they are simply a way to try and generate an easy visualisation of statistics. Many people find looking at or reading numbers quite dull or find it difficult to read the significance into certain statistics. The radars are simply an attempt to make it easier to get an immediate impression of a player’s statistics in a more accessible manner. There is nothing new outside of the standard information that is relatively easily accessible on the internet. The idea for them came from the excellent @mixedknuts on Twitter, who has used them successfully in visualising football statistics.

Statistics and data in tennis are really quite poor compared to the majority of major sports. Compared to a similar sized sport, such as cricket, there is significantly less useful data out there. The availability and complexity of available cricket data is like a separate continent from tennis. Compared to football, which has undergone a revolution over the past few years in terms of availability and quantity of data, it is in a whole different world. Compared to the American sports, such as baseball, NFL and hockey, it is simply incomparable. The data available to analyse those sports is quite simply outstanding. It is equivalent to a separate galaxy.

So, how do the radars work? The statistics available on the radar may change over time as more data or new measures become available, but the overall theory behind them will remain the same. Including the inner circle and the outer edge, there are ten rings that make up the radar. The mid-point of each axis represents the ATP or WTA mean value for that particular statistic. So, the perfectly average ATP player will have a radar that joins up the midpoint of each axis.

The inner circle represents a value that is two standard deviations below the mean. Similarly, the outer edge of the radar represents a value that is two standard deviations above the mean. Getting slightly more complex, if we roughly assume that all the values in the sample follow a standard normal distribution, then 95.45% of all players will lie within two standard deviations above or below the mean on each statistic. Thus, if a player reaches the outer edge on a statistic, we can roughly suggest that he is within the top 2.2% in the ATP/WTA in that attribute. Similarly, if a player is within the inner circle, we can suggest that they are within the bottom 2.2% in that attribute.

So, that is roughly how the radar itself works. Now, we will look at each of the current attributes. Most of them are fairly self-explanatory, but one or two of them are slightly more unusual.

‘% Won on 1st Serve’ and ‘% Won on 2nd Serve’ are fairly straightforward and simply represent the percentage of points won on each of the player’s two serves. Similarly, the ‘% Won on Return’ represents the percentage of points won on the opponent’s serve. Related to the serve, ‘Aces/Game’ and ‘DF/Game’ are both as they suggest – it is the expected number of aces and double faults per service game. Dividing through by the number of service games helps to negate the effect of having played lots of long matches, where one might expect a player to serve more aces than in a series of shorter matches. Similarly, BP Faced/Game and BP Created/Game are based around the same principle for break points faced on the player’s own serve and created on his opponent’s serve.

The two slightly more unusual statistics are the Break Point Save Rate and Break Point Conversion Rate. The BP Conversion Rate is calculated by dividing the % of break points won by the player's % won on return to determine whether he performs better or worse compared to an average point when he creates a break point on his opponent's serve. A value of 100 corresponds to performing exactly the same, whether it is break point or not, a value greater than 100 corresponds to performing better on break point than an average point and a value lower than 100 corresponds to under-performing on break point. In the same way, the BP Save Rate divides the % of break points saved on a player's serve by the % of points won on serve to determine whether he performs better when facing break point.

Now, let us look at a couple of standard shapes of radars. The first is the typical 'servebot', who has a huge serve, bangs down plenty of aces, but has very little on return. The example of this is Ivo Karlovic on grass - pretty much the definition of a typical 'servebot'. We can see plenty of area filled at the top and on the left-hand side where the serve statistics dominate, but very little on the right-hand side in the return and break point creation areas of the radar.
The second example is a player with a very weak serve, but whose return game is crucial to remaining competitive in matches. Here, we have a young Argentinean player - Diego Sebastian Schwartzman - on all surfaces. We can see the right-hand side of the radar is now dominant with high outcomes in the return and break point creation areas, while the top and left-hand side is virtually unfilled, illustrating the lack of ability of serve.
In reality, the vast majority of players will lie somewhere between the two extremes. It is also important to remember that different abilities can go into generating high statistics in certain areas - the top players are likely to generate high serving statistics, even if their serve is not that great, simply due to their superior ability in rallies.

So, the radars can give an idea of the style of game that certain players adopt and it can give us an idea of the overall quality of a player. Certain of the statistics are likely to be highly repeatable - a big server is likely to have high values for the first serve statistic, for aces and break points faced in each separate year, while top returners are likely to have high values in the return and break point creation. However, without further work, one can only speculate as to whether break point conversion and save rates are repeatable across years.

No comments:

Post a Comment

Powered by Blogger.