MMLL webinar IV: From Sparse Modeling to Sparse Communication
The Mercury Machine Learning Lab (MMLL) would like to invite you to the MMLL online seminar series. In this series of four webinars, the lab will focus on causality, information retrieval, natural language processing, and reinforcement learning.
In this 4th and final webinar, Prof. dr. André Martins (LST Lisbon) will give a presentation on “From Sparse Modeling to Sparse Communication.”
Registration link: here
15.00: Opening by Dr. Wilker Aziz (ILLC, UvA)
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability.
In the first part of the talk, I will describe how sparse modeling techniques can be extended and adapted for facilitating sparse communication in neural models. The building block is a family of sparse transformations called alpha-entmax, a drop-in replacement for softmax, which contains sparsemax as a particular case. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful to build interpretable attention mechanisms. Variants of these sparse transformations have been applied with success to machine translation, natural language inference, visual question answering, and other tasks.
In the second part, I will introduce mixed random variables, which are in-between the discrete and continuous worlds. We build rigorous theoretical foundations for these hybrids, via a new “direct sum” base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic (“sample-and-project”) and an intrinsic one (based on face stratification).
In the third part, I will show how sparse transformations can also be used to design new loss functions, replacing the cross-entropy loss. To this end, I will introduce the family of Fenchel-Young losses, revealing connections between generalized entropy regularizers and separation margin. I will illustrate with applications in natural language generation, morphology, and machine translation.
This work was funded by the DeepSPIN ERC project (https://deep-spin.github.io).
Bio of Prof. dr. André Martins:
André Martins (PhD 2012, Carnegie Mellon University and University of Lisbon) is an Associate Professor at Instituto Superior Técnico, University of Lisbon, researcher at Instituto de Telecomunicações, and the VP of AI Research at Unbabel. His research, funded by a ERC Starting Grant (DeepSPIN) and other grants (P2020 project Unbabel4EU and CMU-Portugal project MAIA) include machine translation, quality estimation, structure and interpretability in deep learning systems for NLP. His work has received best paper awards at ACL 2009 (long paper) and ACL 2019 (system demonstration paper). He co-founded and co-organizes the Lisbon Machine Learning School (LxMLS), and he is a Fellow of the ELLIS society.