DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, Barbara McGillivray

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

25 Citations (Scopus)

Abstract

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We describe in detail the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible -- diachronic and synchronic -- uses for this dataset.
Original languageEnglish
Title of host publicationProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Place of PublicationOnline and Punta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics
Pages7079-7091
Number of pages13
Publication statusPublished - 1 Nov 2021

Fingerprint

Dive into the research topics of 'DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages'. Together they form a unique fingerprint.

Cite this