Data Science for Social Good Berlin (DSSG Berlin) is an initiative of Data Scientists from Berlin, which supports non-profit organizations pro bono to use their data effectively and to gain insights suitable for projects. This is because they often lack the necessary expertise. Katharina Rasch, Data Scientist and part of the core team at DSSG Berlin, explains in detail what the preparatory work and know-how goes into the project and what concrete results can look like.
Big Data is a buzz phrase that is used in various situations and is constantly developing.
To classify Big Data decisively is not so easy. Firstly, it is not just a stand-alone term but rather a combination of many aspects to reveal a whole picture. And secondly, Big Data is a buzz phrase that is used in various situations and is constantly developing. It is time to set things straight.
Buzz phrase? Collective term? Synonym?
All of the above. Fundamentally, Big Data represents large digital data volumes as well as the capturing, analyzing and evaluating of it. Therefore, Big Data is also the collective term for all digital technologies, architectures, methods and processes that are required for these tasks. Or as Hasso Plattner says: “Big Data is a synonym for large data volumes in a wide range of application areas as well as for the associated challenge of being able to process them.”
Large data volumes?
Very large. “By the year 2003, humans had created a total of 5 trillion gigabytes of data. In 2011 the same amount was created within 48 hours. Now, creating the same data volume requires just 7 minutes,” illustrated RBB Radioeins in simple and effective terms. Driven by the internet, social networks, mobile devices and the Internet of Things, the worldwide digital data volumes will grow another tenfold by 2020. In Germany alone the current figure of 230 billion GB will rise to 1.1 trillion GB.
This is exactly were Big Data comes into play: The huge data volumes are checked for relationships using a such algorithm, and the whole process requires a combination of several disciplines. “It ranges from traditional informatics and data science to interface design. Machine learning, deep learning and artificial intelligence to mathematics, statistics and data interfaces,” explains Florian Dohmann, Senior Data Scientists at The unbelievable Machine Company. “A lot of this is nothing new, but combining them all creates the basis for new opportunities.”
So it is only about data volumes?
Fundamentally, yes. Big Data is firstly defined by data volumes that are “too large, too complex, change too quickly or are structured too weakly to be analyzed with manual and traditional data processing methods,” according to Wikipedia. But to define where Big Data begins – i.e. from which point the targeted use of data becomes a Big Data project – you need to take a close look at the details.
Four gold medals and one silver medal during the 2018 Winter Olympics are proof that Jac Orie is a successful speed skating coach. Why? It all has to do with data!
In the ice skating world, the name of Jac Orie is well established. He is the man behind the biggest successes of many Dutch speed skaters. Gerard van Velde in 2002, Marianne Timmer in 2006, Marc Tuitert in 2010 and Stefan Groothuis in 2014: they all won Olympic gold working with Orie. Apart from a mountain of medals, these skaters have left something valuable: a huge amount of data. Advanced analytics on almost two decades worth of data has helped Orie to train his team even more smartly in the run-up to the 2018 Winter Olympic Games in Pyeongchang, South Korea.
The results of Orie’s big data project have been astounding so far. Millions of viewers all over the world saw Sven Kramer (men’s 5,000 metres), Carlijn Achtereekte (women’s 3,000 metres) and Kjeld Nuis (men’s 1,000 and 1,5000 metres) skating to gold. And Patrick Roest (men’s 5,000 metres) won silver. Less visible is what exactly lies behind these successes. For many years, Orie has been using test data generated by skaters to calculate speed and stamina. For Pyeongchang however, he went one step further and collaborated with Leiden-based data scientist Arno Knobbe.
The big data approach, whereby computing power is used to perform calculations on big volumes of data has led to many useful insights. These include the relation between the type of training and the moment, duration and intensity of the training. A skater who has profited hugely from this is Kjeld Nuis. Data showed that stamina training in the morning proved ineffective for him, leading to an improvement in his training programme – and two gold medals in Pyeongchang.
For Orie, Knobbe and the skating sport in general, the big data journey is just beginning. For example, the phenomenon of ‘supercompensation’ still needs to be figured out. Supercompensation is what happens when an athlete temporarily lowers the training intensity, leading to recovery of the body and an increase in racing performance. Obviously, this effect needs to be timed perfectly in the run-up to an important race. It’s a complex equation, with the results of training sessions sometimes showing up months later and with training types having different effects on performance for sprinting distances (especially the 500 and 1,000 metres), on the one hand, and longer distances (1,500 metres and above), on the other.
Golden opportunities – everywhere
It is certainly not an exaggeration to say that the 2018 Winter Olympics have become the first big data Olympics. As a best practice, the example set by the Dutch skaters will be followed by other athletes looking to optimize their performance. And it’s not just in sporting events that data thinking is making such an impact. Many companies are becoming more data-driven. At
Basefarm, we work together with some of these companies to explore their existing wealth of unexplored data and find new use cases. In the manufacturing, service and maintenance industries, for instance, the use of predictive maintenance saves companies millions of euros every year. And this is only just the beginning. Undoubtedly, big data will shape the next Olympic games as well as the business world of tomorrow. Our question to you: will you be a contender for gold?
About Ronald Tensen
Ronald Tensen is Marketing Manager at Basefarm in the Netherlands. He has a broad experience in the internet and IT industry (B2B and B2C), successful at developing and launching new consumer services and brands, strong customer focus and of course he is a great team player!