SetConv: A New Approach for Learning from Imbalanced Data

Yang Gao, Yi-Fan Li, Yu Lin, Charu Aggarwal, Latifur Khan

Machine Learning for NLP Long Paper

Gather-1B: Nov 17, Gather-1B: Nov 17 (02:00-04:00 UTC) [Join Gather Meeting]

Abstract: For many real-world classification problems, e.g., sentiment classification, most existing machine learning methods are biased towards the majority class when the Imbalance Ratio (IR) is high. To address this problem, we propose a set convolution (SetConv) operation and an episodic training strategy to extract a single representative for each class, so that classifiers can later be trained on a balanced class distribution. We prove that our proposed algorithm is permutation-invariant despite the order of inputs, and experiments on multiple large-scale benchmark text datasets show the superiority of our proposed framework when compared to other SOTA methods.

Connected Papers in EMNLP2020

Similar Papers

Public Sentiment Drift Analysis Based on Hierarchical Variational Auto-encoder
Wenyue Zhang, Xiaoli Li, Yang Li, Suge Wang, Deyu Li, Jian Liao, Jianxing Zheng,
Textual Data Augmentation for Efficient Active Learning on Tiny Datasets
Husam Quteineh, Spyridon Samothrakis, Richard Sutcliffe,
FIND: Human-in-the-Loop Debugging Deep Text Classifiers
Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni,