ICAI Interview with Sebastian Schelter: Tackling the data management questions for machine learning

One of the major problems in machine learning right now is managing real world data. Sebastian Schelter: ‘In the real world you have new data every day, every minute or every second. A small mistake is quickly made. And then that small mistake can have devastating consequences for the model.’

Sebastian Schelter is Lab Manager of AIRLab Amsterdam and has a Joint Appointment position. He is funded by both Ahold Delhaize and the University of Amsterdam. The AI for Retail (AIR) Lab Amsterdam is a joint UvA-Ahold Delhaize industry lab.

On the next Lunch-at-ICAI meeting on June 17, AIRLab will talk about the challenges in machine learning. What is the biggest challenge?

Schelter: ‘There are some classical machine learning problems in retail and e-commerce that we address at AIRLab, such as forecasting and recommendation. But by far the biggest problem right now, is the data management question for machine learning. A lot of these problems relate to handling the data and occur when you build real systems and real applications around machine learning algorithms. This is something that is often overlooked when people talk about AI. It’s one of the biggest issues in machine learning right now, but nobody likes to talk about it.’

Why don’t people talk about this?

‘Because it is a very difficult problem. It is hard to study in academia, because academics don’t have access to real systems. One of the advantages of AIRLab is that we can look at these problems because we collaborate with companies. That’s a unique situation. And another reason is that this problem lies at the intersection of the data management and the machine learning community. Problems at the intersections of fields are always more difficult to study because you have to bring together people from different expertise’s.’

How does AIRLab tries to tackle this problem?

‘In the real world you have new data every day, every minute or every second. And the data might change because the world changes and systems change. So very often, before you can actually feed data into a machine learning model, you have to preprocess it, join together different datasets, clean it, filter it, and convert the format. A small mistake is quickly made. And then that small mistake can have devastating consequences for the model. So what we are doing, is building tools to make it easier for data scientists to find these problems and fix them.‘

You have a Joint Appointment position since February 2020. How is this working out?

‘I’m one day per week at Ahold and four days at UvA. Within this setup I get the advantages from both worlds. I’ve been working in similar setups for a longer time already, going back and forth between academia and industries. I like research, but I also like to build things that get used in the real world. I think it is very valuable for computer scientists to take a step out of the lab. Then you find a lot of interesting problems that you wouldn’t have found otherwise.’

What are the challenges that come with this position?

‘You need to bring together the business and the academic side. You have to look for problems that are academically important and interesting, but that also have a business value in a certain amount of time.’

Where do you see Airlab in three years?

‘The PhD students have already started to write great research papers. We’ve had some really good results recently, which I can’t tell you yet. I’m convinced that by the end of the five-year-period of AIRLab, we will have developed a set of technologies and solutions that have real world impact and will be used by our partner companies.’

On June 17, 2021, AIRLab Amsterdam will address the data management challenges for machine learning during the Lunch at ICAI meeting. More info and sign up here.