In cooperation with ICL we created a new corpus of contemporary Czech poetry (KSP). It contains poems published in 1990–2020 either in print or on web literary forums. Sized 35 mil. words, KSP ranks among the largest corpora of its kind in the world.
The DIALEKT corpus has more than doubled in size to 223 thousand words in its newly published release 2. It is complemented by the Mapka application which has gained new features, e.g. downloadable custom map layers with user-defined points.
InterCorp release 13ud was published. It contains the same texts as release 13, the only difference is in its annotation that conforms to the Universal Dependencies standard. The UD annotation is comparable across languages and it also includes syntax.