The Chilean Waiting List Corpus: a new resource for clinical Named Entity Recognition in Spanish

Pablo Báez, Fabián Villena, Matías Rojas, Manuel Durán, Jocelyn Dunstan

3rd Clinical Natural Language Processing Workshop (Clinical NLP 2020) Workshop Paper

You can open the pre-recorded video in a separate window.

Abstract: In this work we describe the Waiting List Corpus consisting of de-identified referrals for several specialty consultations from the waiting list in Chilean public hospitals. A subset of 900 referrals was manually annotated with 9,029 entities, 385 attributes, and 284 pairs of relations with clinical relevance. A trained medical doctor annotated these referrals, and then together with other three researchers, consolidated each of the annotations. The annotated corpus has nested entities, with 32.2% of entities embedded in other entities. We use this annotated corpus to obtain preliminary results for Named Entity Recognition (NER). The best results were achieved by using a biLSTM-CRF architecture using word embeddings trained over Spanish Wikipedia together with clinical embeddings computed by the group. NER models applied to this corpus can leverage statistics of diseases and pending procedures within this waiting list. This work constitutes the first annotated corpus using clinical narratives from Chile, and one of the few for the Spanish language. The annotated corpus, the clinical word embeddings, and the annotation guidelines are freely released to the research community.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.