This article deals with the following three questions on a high level and very briefly:
- What does a data-driven person think when he hears contentions?
- Which tool is more practical for data analyses: R, Python, Java, MATLAB?
- Can sporting disciplines be the next application area for data analyses and machine learning?
The following text does not explicitly answer these questions, but provides ideas for all those who are curious...
A few years ago, I saw a TV commercial which alleged that the so-called "Fosbury Flop" has revolutionized the sports world of high jump.
Therefore, I decided to obtain the data of the high jump world records one day and analyze it more precisely. In this process, R is the ideal tool. Due to the compact R syntax, I can easily see the world records of high jump and have a closer look at the topic.
I used Wikipedia as a source and after some manual text formatting with Notepad++, I was able to save the data on records, athletes, countries and years in CSV files. I uploaded the data of both men (m) and women (w) into R and was thus able to realize my project!
Data Analysis with R
Comments from My First Explorative Analysis:
1. Breakthrough after the "Fosbury Flop"
One can in fact see that the development of the high jump world records has reached some kind of limit in the mid 60-ies. It was only from the mid 70-ies onwards that Dick Fosbury's (USA) new technique, together with the use of thick softer landing mats, slowly became established and further increased the records. The trick is to keep the center of the body as far as possible below the bar during the jump. This is accomplished by jumping and landing on the upper back. But, wait a second! Would the continued use of sandpits for landing not have slowed down the success of this new technique? Probably yes...
2. However, the pattern had already existed before
The data also shows that there had already been a previous breakthrough at the end of the 50-ies, beginning of the 60-ies by the use of the "Straddle" technique instead of the "Western Roll" technique. Keeping the body center below the bar had already been the crucial point here. However, interestingly, the increase of the records with the Straddle technique had been significantly steeper than in the time of the "Fosbury Flop". Why?
Image: Scatterplots of the high jump world records (m/f) generated with R.
3. The 2.00 Meter Level
This record has already been reached by men as of 1914. For women, it took a little longer so that the record level has only been surpassed as of 1977. Thus, I thought it would be fairer to have a look at the levels scaled by an average body height (see plot in the image on the right). In case of women that would be approximately 1.60m, in case of men 1.74m. However, these averages are a very rough estimation and other important factors, such as e.g. percentage of muscle mass and mitochondria in the muscle cells can have an effect. Thus, in this context, the comparison is not entirely precise and complete.
4. Shared Records
In the history of high jump, it only happened four times that two men shared the record. In case of women, it happened ten times and there were up to three athletes who shared a record level (see above on the right in the left plot). So, in sum more than twice as often as in case of men, which I classify as a sign of a balanced competition. This makes it a more exciting watch and, by the way, is a very important point regarding the popularity of games and competitions nowadays.
5. No New Records as from the 90-ies
There have been no new world records in case of women as from 1987, in case of men as from 1993. Why is that? Despite all developments of training methods, professionalization of sports, nutrition for athletes, sports materials etc. ... Do all these measures actually make sense? Is it time to invent a new jumping technique? Have we reached a human limit? Or do we need to take a closer look at the records of the 80ies and 90ies?
I thought about all this because I saw a few plots of high jump world records by means of R after I have not been satisfied with a statement in a TV commercial on the revolution in this sport.
My R code is attached below ...
<pre> # Die Datensätze enthalten: Höhe in m, Vorname, Nachname, Land und Jahr. hjr_w <- read.csv(file="Hochsprung_Daten_W.csv",header=T,sep=",") hjr_m <- read.csv(file="Hochsprung_Daten_M.csv",header=T,sep=",") print(summary(hjr_w)) print(summary(hjr_m)) # Skalierung durch durchschnittliche Körpergröße (w ca. 1.60m, m ca. 1,74m) norm_hjr_w <- hjr_w$Height/1.60 norm_hjr_m <- hjr_m$Height/1.74 # Plotten par(mfrow=c(1,2)) plot(hjr_w$Year,hjr_w$Height,main="Hochsprungsrekorde \nvs. Jahr (m/w)",xlab="Jahr",ylab="Höhe",xlim=c(1910,2000),ylim=c(0.8,2.5),col=3,pch=2) points(hjr_m$Year,hjr_m$Height,col=4,pch=3) stripchart(hjr_w$Height,add=TRUE,method="stack",at=1998,vertical=TRUE,col=3,offset=0.7,pch=1) stripchart(hjr_m$Height,add=TRUE,method="stack",at=1998,vertical=TRUE,col=4,offset=0.7,pch=1) grid(4,4) legend(1980,1.1,c("w","m"),col=c(3,4),pch=c(2,3)) plot(hjr_w$Year,norm_hjr_w,main="Skalierte Hochsprungsrekorde \nvs. Jahr (m/w)",xlab="Jahr",ylab="Höhe/durchschn. Größe (m/w)",xlim=c(1910,2000),ylim=c(0.8,2.5),col=3,pch=2) points(hjr_m$Year,norm_hjr_m,col=4,pch=3) grid(4,4) legend(1910,2.4,c("w","m"),col=c(3,4),pch=c(2,3))</pre>