
ICAI: Deep-Dive Open-Source Resources NL
ICAI Deep-Dive is a community meetup that focuses on topics of shared interest and subjects in-depth, despite the differences of the primary research domains of each community member and lab. The aim is to promote knowledge production, transfer and utilization via experience sharing and discussion; finding solutions for the common challenges and issues in the community. Join this Deep-dive into open-source resources, register here.
Program
12:00 Opening by chair
12:05 Sebastian Schelter (AirLab Amsterdam, UvA, Apache Software Foundation ) leads a discussion on “Licenses and Real-world experiences”
12:25 Clemencia Siro (UvA) leads a discussion on “NLP to Low-Resource Settings.”
12:40 James Meakin (AI for Health lab, RadboudUMC, grand-challenge.org) and André Dekker (Brightlands Smart Health Lab, UM) lead a discussion on “Community building and funding of infrastructure software development.”
13:00 End
Abstract
“NLP to Low-Resource Settings”
Recently, many NLP tasks have seen a considerable improvement due to the use of deep learning techniques. This includes contextualized word embeddings based on transformers and pre-trained on language modelling. However, they often require large amounts of labelled training data, which may not be available for many low-resource languages. Approaches such as transfer learning and distant supervision have been used to extend deep learning models to low-resource languages. There has been growing evidence that transferring approaches from high to low-resource languages has limited effectiveness due to how morphologically rich low-resource languages are and due to token alignment errors. To tackle the problem of a lack of data for these languages, Masakhane, an African NLP group has organized the creation of labelled data for different NLP tasks for several African languages. In this talk, I will discuss the initiatives which have been taken by Masakhane in contributing to open-source resources for African languages to foster NLP research of low-resource languages.