How to use Data Science workshops to get to a specific project

In order to advance projects related to machine learning and AI, a number of hurdles have to be overcome. Initial ideas have to be developed and tested for their suitability for these technologies, and the existing data situation has to be evaluated. Ideation and scoping workshops help to define concrete proofs of concept and to ensure sustainable minimum viable products.

“Often companies have numerous ideas about machine learning and AI – but little clarity as to whether this technology really fits and what method is suitable,” explains Dr. Olivia Lewis, Head of Data Science at The unbelievable Machine Company (*um). In many cases, AI and machine learning projects require very specialized know-how from data scientists. The *um specialists have already developed solutions for leading companies in the automotive, chemical, finance and logistics industries. “Depending on the complexity, proof-of-concept projects take between three months and one year,” reports the data science expert. A step-by-step approach can be useful. For example, when it comes to fraud detection, some subcategories could be targeted first and – if successful – the entire dataset later. Through the implementation on the basis of agile processes, the path remains transparent and comprehensible.

But first of all it is important to find a viable use case. In most cases, it is worthwhile to start with problems or weak points that are a constant concern. Depending on the challenge, the focus can be on different areas, such as predictive maintenance, natural language processing (NLP), image recognition or optical character recognition (OCR).

Ideation Workshop: Finding use cases and prioritizing them in a roadmap

If there is still some uncertainty about what can be achieved with new methods of data analysis, the two-day Ideation Workshop can provide guidance. Two data scientists from *um help to better understand what can be done with the existing data. The key is to maintain a great openness for additional, new ideas. “In brainstorming, we go through various phases in order to break away from all preconceived assumptions and allow seemingly crazy ideas,” reports the data scientist. The next step is to pack these ideas into use case packages. Generally, clusters are formed from ideas whose feasibility, complexity and value creation potential are evaluated from a business perspective. The workshop focuses on formulating a concrete question that can be tackled promptly.

The data experts analyse how difficult the implementation would be, which points could be problematic, which sensitive data is involved and which contact persons in the team should be included. The result is a matrix of use cases with initial information on expected inputs and outputs, data and variables. On this basis, a ranking is created to find the best use case for the first steps. The experienced data scientists show the dependencies between the application scenarios and which preliminary work is necessary to determine the best order.

Yet it is also about a better understanding about how AI methods work. “In many cases, the contact person of a company has the idea that a formula is developed to describe a specific output variable. But it’s more important to move away from formula-driven and rule-driven ideas and take a statistical approach,” reveals Olivia Lewis. So instead of devising a formula, you would label the data and train the algorithm on it so that it can, for example, learn to distinguish whether a device is functional or broken.

Scoping Workshop: Ready for a concrete PoC

In order to initiate a specific project or proof of concept, two *um Data Scientists work on site with the participants to explore a use case for one day. The first step is to understand the implementation potential based on the available data. Here too, it is important to keep an open mind for further new ideas. It makes sense to provide the data in question to the experts in advance for an examination. Very important here: the data should not only show niche effects, but also examples of different states such as “good”, “medium”, “non-functional”. This requires a so-called “representative example data set”. If possible, it should contain all existing data phenomena in the relative set – exactly as they occur in the original data set. In this way one wants to avoid that only an excerpt of error-free data is provided as an example, although perhaps the majority of the original data is faulty. On such a wrong base the possible implementation of use cases could not be estimated correctly.

Practice shows that data is often missing or that they are unequally weighted: Sometimes, for instance, only the click rate is available on anonymized user data, while more data is available on other users. Such data would then not be used in the same way in the use case. In addition and if necessary, the data scientists can use an outlier analysis to investigate how strong the dispersion in the data is and how reliable the underlying database. During the workshop, an initial concept of the feasibility study (PoC) is then developed, combined with a concrete procedure and an assessment of how complex the project will be and what time frame is required.

A workshop combination pays off for rough ideas

Frequently, companies already have a rough idea for implementation at the beginning. In this case, ideation and scoping can also be combined in a two-day workshop. Nevertheless, the data for a preliminary review should already be available here as well. In the hybrid workshop, the ideation phase is not started from scratch, but the idea is substantiated after a short brainstorming phase. “Here, too, it is important to openly go into the idea finding process once again. Often, this way important ideas come together after all,” says Olivia Lewis. With an existing idea, the use case can be identified and worked out more quickly – so the scoping phase can begin on the second day. The result of the combined workshop is also a roadmap and an estimate of how complex and time-consuming the proof of concept will be.

Dr. Olivia Lewis, Head of Data Science at The unbelievable Machine Company

This post is also available in: German