Release 14 of the InterCorp parallel corpus has been published at the end of January. An overview of all the enhancements can be found in the version history at the CNC wiki.
Release 14 of the InterCorp parallel corpus has been published at the end of January. An overview of all the enhancements can be found in the version history at the CNC wiki.
The DIALEKT corpus has more than doubled in size to 223 thousand words in its newly published release 2. It is complemented by the Mapka application which has gained new features, e.g. downloadable custom map layers with user-defined points.
InterCorp release 13ud was published. It contains the same texts as release 13, the only difference is in its annotation that conforms to the Universal Dependencies standard. The UD annotation is comparable across languages and it also includes syntax.
SYN release 9 was published as another update of the SYN corpus of contemporary written Czech. With journalistic texts from 2019, its size exceeded 4.7 billion words. SYN release 10 with data from 2020 will be available at the beginning of next year.
The sixth edition of the Translation in Transition conference will take place in Prague in September 2022. Abstract submission is now open, the deadline is 14 Feb 2022. Details are available in the second call for papers at the conference web.
An EU-funded CLS INFRA project invites applications for fellowships that are aimed to help researchers working in the area of computational literary studies to get access to and support in one of the institutions involved (including CNC).
We have released a new version of the Mapka application for working with spoken and dialectal corpora. The new features include additional regional historical division and the possibility to save one’s own map layer with user-defined markers.
We are proud to release Parlcorp, a corpus of Czech Parliamentary Speeches from period 1993-2021 compiled by M. & M. Berrocal.
We are proud to release CODIT, a hosted diachronic corpus of Italian compiled by Maria Silvia Micheli. CODIT maps six main text registers and covers the period from the 13th century until 1947.
A new release of the NET corpus of Czech semi-official internet communication has been published. In its release 2, it grew more than three times in size and it now covers internet discussions and blogs from 120+ domains.