TNT: Text Normalization based Pre-training of Transformers for Content Moderation
Fei Tan, Yifan Hu, Changwei Hu, Keqian Li, Kevin Yen
NLP Applications Short Paper
You can open the pre-recorded video in a separate window.
Abstract:
In this work, we present a new language pre-training model TNT (Text Normalization based pre-training of Transformers) for content moderation. Inspired by the masking strategy and text normalization, TNT is developed to learn language representation by training transformers to reconstruct text from four operation types typically seen in text manipulation: substitution, transposition, deletion, and insertion. Furthermore, the normalization involves the prediction of both operation types and token labels, enabling TNT to learn from more challenging tasks than the standard task of masked word recovery. As a result, the experiments demonstrate that TNT outperforms strong baselines on the hate speech classification task. Additional text normalization experiments and case studies show that TNT is a new potential approach to misspelling correction.
NOTE: Video may display a random order of authors.
Correct author list is at the top of this page.
Connected Papers in EMNLP2020
Similar Papers
Unsupervised Text Style Transfer with Padded Masked Language Models
Eric Malmi, Aliaksei Severyn, Sascha Rothe,

Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation
Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan,

Improving Text Generation with Student-Forcing Optimal Transport
Jianqiao Li, Chunyuan Li, Guoyin Wang, Hao Fu, Yuhchen Lin, Liqun Chen, Yizhe Zhang, Chenyang Tao, Ruiyi Zhang, Wenlin Wang, Dinghan Shen, Qian Yang, Lawrence Carin,

Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
Minki Kang, Moonsu Han, Sung Ju Hwang,
