Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020;54(1):273-301.
doi: 10.1007/s10579-019-09480-6. Epub 2019 Nov 30.

NorthEuraLex: a wide-coverage lexical database of Northern Eurasia

Affiliations

NorthEuraLex: a wide-coverage lexical database of Northern Eurasia

Johannes Dellert et al. Lang Resour Eval. 2020.

Abstract

This article describes the first release version of a new lexicostatistical database of Northern Eurasia, which includes Europe as the most well-researched linguistic area. Unlike in other areas of the world, where databases are restricted to covering a small number of concepts as far as possible based on often sparse documentation, good lexical resources providing wide coverage of the lexicon are available even for many smaller languages in our target area. This makes it possible to attain near-completeness for a substantial number of concepts. The resulting database provides a basis for rich benchmarks that can be used to test automated methods which aim to derive new knowledge about language history in underresearched areas.

Keywords: Caucasian languages; Indo-European languages; Lexical database; Northern Eurasia; Siberian languages; Turkic languages; Uralic languages.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Map of languages in NorthEuraLex 0.9 (except Eskimo-Aleut). Each language family is encoded by a different combination of color and shape. (Color figure online)
Fig. 2
Fig. 2
Data collection and processing workflow for NorthEuraLex 0.9. Green nodes represent our sources of information, yellow nodes stand for informal auxiliary files, and orange nodes represent data files in machine-readable standardized formats. (Color figure online)
Fig. 3
Fig. 3
Sample of words for ‘rainbow’ in the web interface

Similar articles

Cited by

References

    1. Bouchard-Côté A, Hall D, Griffiths TL, Klein D. Automated reconstruction of ancient languages using probabilistic models of sound change. Proceedings of the National Academy of Sciences. 2013 doi: 10.1073/pnas.1204678110. - DOI - PMC - PubMed
    1. Bowern, C. (2016). Chirila: contemporary and historical resources for the indigenous languages of Australia. Language Documentation and Conservation (Vol. 10). http://nflrc.hawaii.edu/ldc/.
    1. Buck CD. A dictionary of selected synonyms in the principal Indo-European languages: A contribution to the history of ideas. Chicago: University of Chicago Press; 1949.
    1. Dellert, J. (2017). Information-theoretic causal inference of lexical flow. PhD thesis, Eberhard Karls Universität Tübingen.
    1. Dellert, J. (2018). Combining information-weighted sequence alignment and sound correspondence models for improved cognate detection. In 27th International Conference on Computational Linguistics (COLING 2018).

LinkOut - more resources