Keynote: Linguistic Behaviour and the Realistic Testing of NLP Systems.

Janet Pierrehumbert

Live Presentation: Nov 18, Live Presentation: Nov 18 (15:00-16:00 UTC)

To evaluate the performance of NLP systems, the standard is to use held-out test data. When the systems are deployed in real-world applications, they will only be successful if they perform well on examples that their architects never saw before. Many of these will be examples that nobody ever saw before; the central observation of generative linguistics, going back to von Humboldt, is that human language involves "The infinite use of finite means". Predicting the real-world success of NLP systems thus comes down to predicting future human linguistic behaviour. In this talk, I will discuss some general characteristics of human linguistic behaviour, and the extent to which they are, or are not addressed in current NLP methodology. The topics I will address include: look-ahead and prediction; the role of categorization in building abstractions; effects of context; and variability across individuals.


Janet Pierrehumbert is the Professor of Language Modelling in the Department of Engineering Science at the University of Oxford. She received her BA in Linguistics and Mathematics at Harvard in 1975, and her Ph.D in Linguistics from MIT in 1980. Much of her Ph.D dissertation research on English prosody and intonation was carried out at AT&T Bell Laboratories, where she was also a Member of Technical Staff from 1982 to 1989. After she moved to Northwestern University in1989, her research program used a wide variety of experimental and computational methods to explore how lexical systems emerge in speech communities. She showed that the mental representations of words are at once abstract and phonetically detailed, and that social factors interact with cognitive factors as lexical patterns are learned, remembered, and generalized. Pierrehumbert joined the faculty at the University of Oxford in 2015 as a member of the interdisciplinary Oxford e-Research Centre. Her current research uses machine-learning methods to model the dynamics of on-line language. Her latest project, funded by the UK EPSRC, seeks to develop new NLP methods to characterize exaggeration, cohesion, and fragmentation in on-line forums.

Pierrehumbert is a Fellow of the Linguistic Society of America, the Cognitive Science Society, and the American Academy of Arts and Sciences. She was elected to the National Academy of Sciences in 2019. She is the recipient of the 2020 Medal for Scientific Achievement from the International Speech Communication Association.

You can open the pre-recorded video in a separate window.