DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Dominik Schlechtweg; Nina Tahmasebi; Simon Hengchen; Haim Dubossarsky; Barbara McGillivray

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, Barbara McGillivray

Digital Humanities

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

25 Citations (Scopus)

Abstract

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We describe in detail the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible -- diachronic and synchronic -- uses for this dataset.

Original language	English
Title of host publication	Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Place of Publication	Online and Punta Cana, Dominican Republic
Publisher	Association for Computational Linguistics
Pages	7079-7091
Number of pages	13
Publication status	Published - 1 Nov 2021

Access to Document

https://aclanthology.org/2021.emnlp-main.567

Cite this

@inbook{23c3a6e8c79049c091325c6d723c61bd,

title = "DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages",

abstract = "Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We describe in detail the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible -- diachronic and synchronic -- uses for this dataset.",

author = "Dominik Schlechtweg and Nina Tahmasebi and Simon Hengchen and Haim Dubossarsky and Barbara McGillivray",

year = "2021",

month = nov,

day = "1",

language = "English",

pages = "7079--7091",

booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",

publisher = "Association for Computational Linguistics",

}

Schlechtweg, D, Tahmasebi, N, Hengchen, S, Dubossarsky, H & McGillivray, B 2021, DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 7079-7091. <https://aclanthology.org/2021.emnlp-main.567>

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. / Schlechtweg, Dominik; Tahmasebi, Nina; Hengchen, Simon et al.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021. p. 7079-7091.

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

TY - CHAP

T1 - DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

AU - Schlechtweg, Dominik

AU - Tahmasebi, Nina

AU - Hengchen, Simon

AU - Dubossarsky, Haim

AU - McGillivray, Barbara

PY - 2021/11/1

Y1 - 2021/11/1

N2 - Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We describe in detail the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible -- diachronic and synchronic -- uses for this dataset.

AB - Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We describe in detail the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible -- diachronic and synchronic -- uses for this dataset.

M3 - Conference paper

SP - 7079

EP - 7091

BT - Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics

CY - Online and Punta Cana, Dominican Republic

ER -

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Abstract

Access to Document

Fingerprint

Cite this