ICAI Interview with Martijn Kleppe: gaining insights much quicker by combining AI and Humanities

Martijn Kleppe is a trained historian who collaborates with AI scientists. Kleppe: ‘When I was doing research as a historian, I could analyze twenty books in a month. For the computer this is a matter of seconds.’

Martijn Kleppe

Martijn Kleppe is one of the founding members of Cultural AI Lab and the head of the Research Department of the KB National Library of the Netherlands.

The Cultural AI Lab is a joint effort of the heritage institutions Rijksmuseum, Institute for Sound and Vision, the KB and knowledge institutions CWI, KNAW, UvA, VU and TNO.

What problems within the humanities can be solved with AI?

‘The biggest challenge that we face within the humanities is scale. There is a shift happening right now within historical research from close reading, which we have done for centuries, to distant reading, where we use a computer to detect patterns in all sorts of humanities data. In books, newspapers, television programs, artworks, social media outlets etcetera. This is the biggest opportunity that we have right now, but also the biggest challenge because we have to rely on other competences to make this kind of research possible.’

How is the interest in AI from the cultural world?

‘It is really gaining momentum right now. There is an ecosystem evolving with partners from the culture and media domain – like the Rijksmuseum, the National Archive – and the creative industry, that are interested in applying AI within their services or processes. And several members of our lab recently founded the working group ‘Culture & Media’ within the National AI Coalition with all sorts of cultural and media partners.’

Does the humanities also influence AI?

‘Yes, it creates new academic research questions. Most of the algorithms within the AI domain have been trained with new and high quality data. But in the datasets of the heritage institutions there is 200 year old data. Digitized newspapers from the end of the nineteenth century with a very low quality for example. And newspapers from colonial Indonesia and Suriname with a completely different vocabulary. Bringing in those kinds of datasets within the AI domain offers new perspectives and questions on polyvocal data. How can we handle these kinds of data and how can we improve them? My experience so far is that the computer scientists involved in our project love these new questions.’

At the ICAI meetup on July 8, the lab will speak about contentious words in cultural heritage collections. How does the lab approach this?

‘Handling issues like that, is the essences of our research. We try to answer the question: How can you detect bias in descriptions of artifacts of museum collections? And can you also help the museum by giving them suggestions for other kind of words? Especially last year, with the Black Lives Matter movement, we saw how relevant these questions were. How do we deal with the past? It is a technical, but also really a societal and ethical challenge.‘

Boek uit de KB collectie

As a historian, what attracts you in AI?

‘I have to ask new types of questions. Instead of doing source criticism on books, newspapers or television programs, now I have to do source criticism on algorithms. At the KB we have the Delpher.nl platform for example, where you can search in millions of digitalized texts from Dutch newspapers, books and magazines. In order to search efficiently, you have to understand the basis of the algorithm behind it. What I also really like about the AI research is the teamwork. Traditionally humanities scholars are more solitary researchers. But to be able to collaborate with other disciplines you have to be vulnerable and develop yourself.’

Does it happen that you get a new perspective on heritage collections because of the AI research?

‘Yes, for sure. During my PhD research I wrote an article about the first moment in time when a Dutch photograph was published in a Dutch newspaper. That research was based on manually going through a selection of newspapers. But then four years later I participated in a research project with a historian who was one of the firsts to start applying Computer Vision on historical newspapers. He run an algorithm that went through all the newspapers and immediately said: ‘Martijn, you were completely wrong. The first photograph was published earlier than the moment you mention in your paper.’ That was fantastic. Science is always about gaining new insights. When I did my research, those techniques did not exist yet. We now gain insights much quicker than we did before.’

