Notice Board

Totalita corpus

23.2.2023 Notice Board

We are releasing Totalita, a diachronic corpus of written Czech from the communist regime period (1948–1989). The corpus served as a material base for the dictionary published in 2010.

Together with our colleagues from JÚĽŠ, we have launched the Czechoslovak Word of the Week series. A new word appears every Monday morning, the columns are then reprinted in Friday editions of both Czech Deník N and Slovak Denník N.

Treq: new version

20.1.2023 Notice Board

An update of Treq, the online tool for looking up translation equivalents, is out! Its database has been updated to release 15 of the InterCorp parallel corpus.

ONLINE2

2.1.2023 Notice Board

The second generation of ONLINE corpora was published. As a continuation of the first generation and thanks to its daily updates, ONLINE2 is a perfect source of data to examine current trends in public discourse.

API

22.12.2022 Notice Board

We have made publicly available APIs for querying KonText and Treq. The number of applications with open API will grow in the future.

Corpus of contemporary Czech poetry

11.10.2022 Notice Board

In cooperation with ICL we created a new corpus of contemporary Czech poetry (KSP). It contains poems published in 1990–2020 either in print or on web literary forums. Sized 35 mil. words, KSP ranks among the largest corpora of its kind in the world.

Habemus professorem

29.6.2022 Notice Board

We are pleased to announce that our dear colleague Václav Cvrček has been appointed Professor of Czech Language. Congratulations!

SYN release 10

8.3.2022 Notice Board

SYN release 10 was published as another update of the SYN corpus of contemporary written Czech. With journalistic texts from 2020, its size reached almost 4.9 billion words.

InterCorp release 14

1.2.2022 Notice Board

Release 14 of the InterCorp parallel corpus has been published at the end of January. An overview of all the enhancements can be found in the version history at the CNC wiki.

DIALEKT release 2 and Mapka

27.12.2021 Notice Board

The DIALEKT corpus has more than doubled in size to 223 thousand words in its newly published release 2. It is complemented by the Mapka application which has gained new features, e.g. downloadable custom map layers with user-defined points.