Using Clustering to Predict the Career Potential of ATP Players
There has arguably been a dearth of young talent coming through on the ATP Tour in recent times. Until the emergence of the current crop of youngsters, there were few players that were realistically expected to push toward the top level of the game. One aspect of the emergence of young players that people love to discuss is their potential and seemingly every young player is judged as to whether he has the talent to win a Grand Slam title in the future.
One thing that I thought it might be interesting to look at is whether we can use the statistics of former and current players to help look at how the current crop of young players might progress in the future. Using all of the statistics from the ATP Tour going back to 1991, I took a look at the numbers for every player in the database during their teenage years. Removing those players that had played fewer than 25 matches as a teenager left a sample size of 63 players.
By using a K-means cluster analysis with 8 clusters, it allows us to look a grouping these players to see whether we can find whether similar types of player during their teenage years show a similar career progression.
Cluster 1
(Rafael Nadal)
Rafael Nadal is genuinely unique as a teenager in the sample. No other player comes remotely close to his ability. His return game was outstanding for a player so young and while his serve was not top level, his ability in rallies meant that he was able to limit the number of break points faced.
Cluster 2
(Gael Monfils, Bernard Tomic, Evgeny Korolev, Ryan Harrison, Jose Acasuso, Kei Nishikori, Hendrik Dreekmann, Nicolas Almagro, Donald Young, Galo Blanco and Robby Ginepri)
Within this second cluster, only three of the eleven players have reached the top 10 in the ATP rankings and they have just one Grand Slam final and two Grand Slam semi-finals between them. There are plenty of names in this cluster that have shown promise as a young player, but really struggled to turn that potential into actual top level results. The likes of Gael Monfils and Bernard Tomic often suggest they have the talent, but not the mental side of the game, whilst Ryan Harrison, Robby Ginepri and Donald Young have both struggled to meet the expectations of the American public, who are desperate for a new top level American hero.
A feature of this cluster as teenagers is a pretty solid first serve, but struggles on second serve and return, which indicates a potential reason for the difficulties in maintaining and progressing on the tour.
Cluster 3
(Andy Murray, Michael Chang, Carlos Moya, Albert Costa, Guillermo Coria, Juan Carlos Ferrero and David Nalbandian)
This third cluster is a high level collection of players, all but one of which have reached the top 3 in the ATP rankings and all but two of which have won Grand Slam titles. The two players without Grand Slam titles are Guillermo Coria, who really should have won the 2004 French Open final and David Nalbandian, who is often considered the best player never to have won a Grand Slam title. This group might not have an outstanding first serve, but are very strong behind the second serve and on return, showing their excellent ability in rallies.
Cluster 4
(Roger Federer, Richard Gasquet, Marat Safin, Marin Cilic, Tomas Berdych, Robin Soderling, Ernests Gulbis, Borna Coric, Mario Ancic, Dominik Hrbaty, Goran Ivanisevic, Nicolas Kiefer, Alexander Zverev, Thanasi Kokkinakis and Lars Rehmann)
This is a large cluster, but if we exclude the current youngsters (Coric, Zverev and Kokkinakis), there are only two of the remaining twelve players that did not reach the top 10 in the ATP rankings (Hrbaty and Rehmann). However, with the exception of Roger Federer and Marat Safin, none of the other players in this cluster reached the number one ranking. If we exclude Roger Federer, there are only four of the twelve that reached Grand Slam finals, although only three failed to reach the semi-final of a Grand Slam and Mario Ancic almost certainly would have had it not been for the illness that eventually forced him to retire from tennis.
For the three current players in this group, there are mixed signals. It would seem likely that their current statistics suggest that they could quite feasible make it deep into the second week of Grand Slams in the future, it could be a stretch to see them regularly challenging for titles, although the achievements of Roger Federer and Marat Safin will give them hope. It would be a surprise if Coric, Zverev and Kokkinakis do not reach the top 10 at some stage in their careers and maybe one of them might even win a Grand Slam title one day.
Cluster 5
(Andy Roddick, Mark Philippoussis, Sam Querrey, Richard Krajicek, Pete Sampras, Kenneth Carlsen and Nick Kyrgios)
It is clear to see the similarity between the seven players in this cluster - they all have a booming first serve, their second serve is powerful enough to win them an above average number of points, but they do not quite have the ability on return to be the complete player. However, there are positive signs for Nick Kyrgios with four of the five players in this collection reaching Grand Slam finals and there are plenty of ATP titles as well. Both Pete Sampras and Andy Roddick reached the number one spot in the rankings, while Richard Krajicek and Mark Philippoussis both reached the top 10.
However, with the number of top quality returners at the top level of the men's game at the moment, it remains to be seen whether the serve is still as big a weapon as it has been in the past, but it will always be a threat and there is every chance that Nick Kyrgios will reach a Grand Slam final in the future. Whether he can become a multiple slam winner though remains to be seen.
Cluster 6
(Mikhail Youzhny, Thomas Enqvist, Andreas Vinciguerra, Tommy Robredo, Sjeng Schalken, Tommy Haas, Xavier Malisse, Yevgeny Kafelnikov and Luis Herrera)
This is not one of the stronger clusters, although four of the nine players have reached the top five in the rankings. Just one Grand Slam winner in this collection in Yevgeny Kafelnikov and Thomas Enqvist was the only other player to have reached a final. Players in this cluster may reach the odd Grand Slam semi-final, but they are more likely to be players that lurk around the 10-20 mark in the rankings.
Cluster 7
(Fabrice Santoro, Alex Corretja, Alberto Berasategui, Alberto Martin, David Prinosil, Marcos Ondruska and Mariano Zabaleta)
The weakest of the clusters, this group struggle behind their first serve, have below average second serve statistics and an average return game. Just three of these ever broken into the top 20 with Alex Corretja topping the group with a career high ranking of number two in the world. There are no Grand Slam titles here and just two finalists.
Cluster 8
(Andrei Medvedev, Lleyton Hewitt, Novak Djokovic, Juan Martin Del Potro, Marcelo Rios and Wayne Ferreira)
This is a genuinely top class cluster. Four of the six players reached the number one spot in the rankings, while Del Potro and Medvedev both reached the top four. All but one of the group have reached Grand Slam finals, three have won Grand Slam titles and there are no fewer than 148 ATP titles between the six players. They have above average first serve statistics, a very good second serve and above average ability on return.
Other Young Players
Let us take a quick look at where a couple of current younger players would have fitted in if they had met the minimum 25 matches criteria.
The first is Grigor Dimitrov. He would have joined Coric, Kokkinakis and Zverev in Cluster 4, which would seem to fit with how his career has progressed thus far. He has shown plenty of potential, has a Grand Slam semi-final under his belt, but is yet to really suggest that he will be a fixture at the very top of the rankings.
Jack Sock would have joined his fellow countrymen in Cluster 2. His strong first serve is a threat, but he is yet to really demonstrate that he has the all-round game to really challenge for a top 10 ranking or even top 20 ranking. Elias Ymer also falls into Cluster 2.
Andrey Rublev interestingly falls into Cluster 5 with the big servers, although this seems to be predominantly based around a strong second serve and ability to limit break points faced, but combined with a weak return game. However, he only has a very small sample of matches, so he may well move around to a different cluster once his sample size increases.
Can the statistics of a young Roger Federer be used to plot the career potential of the current youngsters? |
One thing that I thought it might be interesting to look at is whether we can use the statistics of former and current players to help look at how the current crop of young players might progress in the future. Using all of the statistics from the ATP Tour going back to 1991, I took a look at the numbers for every player in the database during their teenage years. Removing those players that had played fewer than 25 matches as a teenager left a sample size of 63 players.
By using a K-means cluster analysis with 8 clusters, it allows us to look a grouping these players to see whether we can find whether similar types of player during their teenage years show a similar career progression.
Cluster 1
(Rafael Nadal)
Rafael Nadal is genuinely unique as a teenager in the sample. No other player comes remotely close to his ability. His return game was outstanding for a player so young and while his serve was not top level, his ability in rallies meant that he was able to limit the number of break points faced.
Cluster 2
(Gael Monfils, Bernard Tomic, Evgeny Korolev, Ryan Harrison, Jose Acasuso, Kei Nishikori, Hendrik Dreekmann, Nicolas Almagro, Donald Young, Galo Blanco and Robby Ginepri)
Within this second cluster, only three of the eleven players have reached the top 10 in the ATP rankings and they have just one Grand Slam final and two Grand Slam semi-finals between them. There are plenty of names in this cluster that have shown promise as a young player, but really struggled to turn that potential into actual top level results. The likes of Gael Monfils and Bernard Tomic often suggest they have the talent, but not the mental side of the game, whilst Ryan Harrison, Robby Ginepri and Donald Young have both struggled to meet the expectations of the American public, who are desperate for a new top level American hero.
A feature of this cluster as teenagers is a pretty solid first serve, but struggles on second serve and return, which indicates a potential reason for the difficulties in maintaining and progressing on the tour.
Cluster 3
(Andy Murray, Michael Chang, Carlos Moya, Albert Costa, Guillermo Coria, Juan Carlos Ferrero and David Nalbandian)
This third cluster is a high level collection of players, all but one of which have reached the top 3 in the ATP rankings and all but two of which have won Grand Slam titles. The two players without Grand Slam titles are Guillermo Coria, who really should have won the 2004 French Open final and David Nalbandian, who is often considered the best player never to have won a Grand Slam title. This group might not have an outstanding first serve, but are very strong behind the second serve and on return, showing their excellent ability in rallies.
Cluster 4
(Roger Federer, Richard Gasquet, Marat Safin, Marin Cilic, Tomas Berdych, Robin Soderling, Ernests Gulbis, Borna Coric, Mario Ancic, Dominik Hrbaty, Goran Ivanisevic, Nicolas Kiefer, Alexander Zverev, Thanasi Kokkinakis and Lars Rehmann)
This is a large cluster, but if we exclude the current youngsters (Coric, Zverev and Kokkinakis), there are only two of the remaining twelve players that did not reach the top 10 in the ATP rankings (Hrbaty and Rehmann). However, with the exception of Roger Federer and Marat Safin, none of the other players in this cluster reached the number one ranking. If we exclude Roger Federer, there are only four of the twelve that reached Grand Slam finals, although only three failed to reach the semi-final of a Grand Slam and Mario Ancic almost certainly would have had it not been for the illness that eventually forced him to retire from tennis.
For the three current players in this group, there are mixed signals. It would seem likely that their current statistics suggest that they could quite feasible make it deep into the second week of Grand Slams in the future, it could be a stretch to see them regularly challenging for titles, although the achievements of Roger Federer and Marat Safin will give them hope. It would be a surprise if Coric, Zverev and Kokkinakis do not reach the top 10 at some stage in their careers and maybe one of them might even win a Grand Slam title one day.
Cluster 5
(Andy Roddick, Mark Philippoussis, Sam Querrey, Richard Krajicek, Pete Sampras, Kenneth Carlsen and Nick Kyrgios)
It is clear to see the similarity between the seven players in this cluster - they all have a booming first serve, their second serve is powerful enough to win them an above average number of points, but they do not quite have the ability on return to be the complete player. However, there are positive signs for Nick Kyrgios with four of the five players in this collection reaching Grand Slam finals and there are plenty of ATP titles as well. Both Pete Sampras and Andy Roddick reached the number one spot in the rankings, while Richard Krajicek and Mark Philippoussis both reached the top 10.
However, with the number of top quality returners at the top level of the men's game at the moment, it remains to be seen whether the serve is still as big a weapon as it has been in the past, but it will always be a threat and there is every chance that Nick Kyrgios will reach a Grand Slam final in the future. Whether he can become a multiple slam winner though remains to be seen.
Cluster 6
(Mikhail Youzhny, Thomas Enqvist, Andreas Vinciguerra, Tommy Robredo, Sjeng Schalken, Tommy Haas, Xavier Malisse, Yevgeny Kafelnikov and Luis Herrera)
This is not one of the stronger clusters, although four of the nine players have reached the top five in the rankings. Just one Grand Slam winner in this collection in Yevgeny Kafelnikov and Thomas Enqvist was the only other player to have reached a final. Players in this cluster may reach the odd Grand Slam semi-final, but they are more likely to be players that lurk around the 10-20 mark in the rankings.
Cluster 7
(Fabrice Santoro, Alex Corretja, Alberto Berasategui, Alberto Martin, David Prinosil, Marcos Ondruska and Mariano Zabaleta)
The weakest of the clusters, this group struggle behind their first serve, have below average second serve statistics and an average return game. Just three of these ever broken into the top 20 with Alex Corretja topping the group with a career high ranking of number two in the world. There are no Grand Slam titles here and just two finalists.
Cluster 8
(Andrei Medvedev, Lleyton Hewitt, Novak Djokovic, Juan Martin Del Potro, Marcelo Rios and Wayne Ferreira)
This is a genuinely top class cluster. Four of the six players reached the number one spot in the rankings, while Del Potro and Medvedev both reached the top four. All but one of the group have reached Grand Slam finals, three have won Grand Slam titles and there are no fewer than 148 ATP titles between the six players. They have above average first serve statistics, a very good second serve and above average ability on return.
Other Young Players
Let us take a quick look at where a couple of current younger players would have fitted in if they had met the minimum 25 matches criteria.
The first is Grigor Dimitrov. He would have joined Coric, Kokkinakis and Zverev in Cluster 4, which would seem to fit with how his career has progressed thus far. He has shown plenty of potential, has a Grand Slam semi-final under his belt, but is yet to really suggest that he will be a fixture at the very top of the rankings.
Jack Sock would have joined his fellow countrymen in Cluster 2. His strong first serve is a threat, but he is yet to really demonstrate that he has the all-round game to really challenge for a top 10 ranking or even top 20 ranking. Elias Ymer also falls into Cluster 2.
Andrey Rublev interestingly falls into Cluster 5 with the big servers, although this seems to be predominantly based around a strong second serve and ability to limit break points faced, but combined with a weak return game. However, he only has a very small sample of matches, so he may well move around to a different cluster once his sample size increases.
a
ReplyDeleteHe is legend player of Tennis. However, I was researching which brand's tennis shoes he is wearing nowadays? So, I got this detailed review guide on best tennis shoes for plantar fascitis
ReplyDeletePositive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work. career coach
ReplyDeleteExcellent effort to make this blog more wonderful and attractive. kiu
ReplyDeleteWonderful illustrated information. I thank you about that. No doubt it will be very useful for my future projects. Would like to see some other posts on the same subject! KIU
ReplyDeletePositive site, where did u come up with the information on this posting? I'm pleased I discovered it though, ill be checking back soon to find out what additional posts you include. top paying freelancer jobs
ReplyDelete