CoDEx: A Comprehensive Knowledge Graph Completion Benchmark

Tara Safavi, Danai Koutra

Information Retrieval and Text Mining Long Paper

Gather-5I: Nov 18, Gather-5I: Nov 18 (18:00-20:00 UTC) [Join Gather Meeting]

Abstract: We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at https://bit.ly/2EPbrJs.

Connected Papers in EMNLP2020

Similar Papers

Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia
Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto,
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning
Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen,
Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment
Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua,
LibKGE - A knowledge graph embedding library for reproducible research
Samuel Broscheit, Daniel Ruffinelli, Adrian Kochsiek, Patrick Betz, Rainer Gemulla,