Extracting Semantic Aspects for Structured Representation of Clinical Trial Eligibility Criteria

Tirthankar Dasgupta, Ishani Mondal, Abir Naskar, Lipika Dey

3rd Clinical Natural Language Processing Workshop (Clinical NLP 2020) Workshop Paper

You can open the pre-recorded video in a separate window.

Abstract: Eligibility criteria in the clinical trials specify the characteristics that a patient must or must not possess in order to be treated according to a standard clinical care guideline. As the process of manual eligibility determination is time-consuming, automatic structuring of the eligibility criteria into various semantic categories or aspects is the need of the hour. Existing methods use hand-crafted rules and feature-based statistical machine learning methods to dynamically induce semantic aspects. However, in order to deal with paucity of aspect-annotated clinical trials data, we propose a novel weakly-supervised co-training based method which can exploit a large pool of unlabeled criteria sentences to augment the limited supervised training data, and consequently enhance the performance. Experiments with 0.2M criteria sentences show that the proposed approach outperforms the competitive supervised baselines by 12% in terms of micro-averaged F1 score for all the aspects. Probing deeper into analysis, we observe domain-specific information boosts up the performance by a significant margin.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.