Understanding the Tennis Radars
Recently, you may have noticed some of the player radars
that I have been posting on Twitter. They do seem to have garnered some
interest, but there have been a number of questions about them, so I have put
together a brief guide as to how they work, what we can learn from them and,
just as importantly, their limitations.
The first thing to mention is that they do not really convey
any new information. Rather, they are simply a way to try and generate an easy
visualisation of statistics. Many people find looking at or reading numbers
quite dull or find it difficult to read the significance into certain
statistics. The radars are simply an attempt to make it easier to get an
immediate impression of a player’s statistics in a more accessible manner.
There is nothing new outside of the standard information that is relatively easily
accessible on the internet. The idea for them came from the excellent @mixedknuts on Twitter, who has used them successfully in visualising football statistics.
Statistics and data in tennis are really quite poor compared
to the majority of major sports. Compared to a similar sized sport, such as
cricket, there is significantly less useful data out there. The availability
and complexity of available cricket data is like a separate continent from
tennis. Compared to football, which has undergone a revolution over the past
few years in terms of availability and quantity of data, it is in a whole
different world. Compared to the American sports, such as baseball, NFL and
hockey, it is simply incomparable. The data available to analyse those sports
is quite simply outstanding. It is equivalent to a separate galaxy.
So, how do the radars work? The statistics available on the
radar may change over time as more data or new measures become available, but
the overall theory behind them will remain the same. Including the inner circle
and the outer edge, there are ten rings that make up the radar. The mid-point
of each axis represents the ATP or WTA mean value for that particular
statistic. So, the perfectly average ATP player will have a radar that joins up
the midpoint of each axis.
The inner circle represents a value that is two standard deviations
below the mean. Similarly, the outer edge of the radar represents a value that
is two standard deviations above the mean. Getting slightly more complex, if we
roughly assume that all the values in the sample follow a standard normal
distribution, then 95.45% of all players will lie within two standard
deviations above or below the mean on each statistic. Thus, if a player reaches
the outer edge on a statistic, we can roughly suggest that he is within the top
2.2% in the ATP/WTA in that attribute. Similarly, if a player is within the
inner circle, we can suggest that they are within the bottom 2.2% in that
attribute.
So, that is roughly how the radar itself works. Now, we will
look at each of the current attributes. Most of them are fairly
self-explanatory, but one or two of them are slightly more unusual.
‘% Won on 1st Serve’ and ‘% Won on 2nd
Serve’ are fairly straightforward and simply represent the percentage of points
won on each of the player’s two serves. Similarly, the ‘% Won on Return’
represents the percentage of points won on the opponent’s serve. Related to the
serve, ‘Aces/Game’ and ‘DF/Game’ are both as they suggest – it is the expected
number of aces and double faults per service game. Dividing through by the
number of service games helps to negate the effect of having played lots of
long matches, where one might expect a player to serve more aces than in a
series of shorter matches. Similarly, BP Faced/Game and BP Created/Game are
based around the same principle for break points faced on the player’s own
serve and created on his opponent’s serve.
The two slightly more unusual statistics are the Break Point
Save Rate and Break Point Conversion Rate. The BP Conversion Rate is calculated by dividing the % of break points
won by the player's % won on return to determine whether he performs better or
worse compared to an average point when he creates a break point on his
opponent's serve. A value of 100 corresponds to performing exactly the same,
whether it is break point or not, a value greater than 100 corresponds to
performing better on break point than an average point and a value lower than
100 corresponds to under-performing on break point. In the same way, the BP
Save Rate divides the % of break points saved on a player's serve by the % of
points won on serve to determine whether he performs better when facing break
point.
Now, let us look at
a couple of standard shapes of radars. The first is the typical 'servebot', who has a huge serve, bangs down plenty of aces, but has very little on return. The example of this is Ivo Karlovic on grass - pretty much the definition of a typical 'servebot'. We can see plenty of area filled at the top and on the left-hand side where the serve statistics dominate, but very little on the right-hand side in the return and break point creation areas of the radar.
The second example is a player with a very weak serve, but whose return game is crucial to remaining competitive in matches. Here, we have a young Argentinean player - Diego Sebastian Schwartzman - on all surfaces. We can see the right-hand side of the radar is now dominant with high outcomes in the return and break point creation areas, while the top and left-hand side is virtually unfilled, illustrating the lack of ability of serve.
In reality, the vast majority of players will lie somewhere between the two extremes. It is also important to remember that different abilities can go into generating high statistics in certain areas - the top players are likely to generate high serving statistics, even if their serve is not that great, simply due to their superior ability in rallies.
So, the radars can give an idea of the style of game that certain players adopt and it can give us an idea of the overall quality of a player. Certain of the statistics are likely to be highly repeatable - a big server is likely to have high values for the first serve statistic, for aces and break points faced in each separate year, while top returners are likely to have high values in the return and break point creation. However, without further work, one can only speculate as to whether break point conversion and save rates are repeatable across years.
In reality, the vast majority of players will lie somewhere between the two extremes. It is also important to remember that different abilities can go into generating high statistics in certain areas - the top players are likely to generate high serving statistics, even if their serve is not that great, simply due to their superior ability in rallies.
So, the radars can give an idea of the style of game that certain players adopt and it can give us an idea of the overall quality of a player. Certain of the statistics are likely to be highly repeatable - a big server is likely to have high values for the first serve statistic, for aces and break points faced in each separate year, while top returners are likely to have high values in the return and break point creation. However, without further work, one can only speculate as to whether break point conversion and save rates are repeatable across years.
No comments: