We are proud to release CODIT, a hosted diachronic corpus of Italian compiled by Maria Silvia Micheli. CODIT maps six main text registers and covers the period from the 13th century until 1947.
We are proud to release CODIT, a hosted diachronic corpus of Italian compiled by Maria Silvia Micheli. CODIT maps six main text registers and covers the period from the 13th century until 1947.
A new release of the NET corpus of Czech semi-official internet communication has been published. In its release 2, it grew more than three times in size and it now covers internet discussions and blogs from 120+ domains.
CNC releases Old Bailey Corpus of trial proceedings which took place in London between 1720 and 1913. Next to EEBO, the OBC corpus becomes another diachronic corpus of English available in KonText. For both corpora, there is a detailed English course in eight lessons available.
New representative corpus, SYN2020, has been released as a successor of SYN2015. The design of SYN2020 is fully compatible with that of SYN2015, but it features a number of enhancements in the lemmatization and morphological tagging.
New releases of two spoken corpora are out now: ORTOFON and ORATOR. Compared to the original releases, the total amount of the language material contained in the ORTOFON v2 and ORATOR v2 corpora has more than doubled.
We are proud to have published the monitor ONLINE corpora that map the Czech web, i.e. internet news, discussions and social networks from 2017 until present. The ONLINE corpora are compiled in cooperation with the Dataweps company, have more than six billion tokens and feature regular daily updates!
The Word at a Glance application has been enhanced with an entirely new operation mode. It shows comparison of word profiles of two or more words in a similar manner as SyD.
An update of Treq, the online tool for looking up translation equivalents, is out! Its database has been updated to release 12 of the InterCorp parallel corpus. Furthermore, you can now also search in translations from/to Spanish (in addition to Czech and English).
Mapka is an interactive map-based application for working with spoken dialectal corpora. It features various functions including a presentation of characteristic features of Czech dialectal areas illustrated by authentic speakers’ utterances.
CNC released a web application for browsing and comparing frequency lists. The Lists app offers interactive filtering based on four types of frequency information for each unit (word form or lemma) in a selected (sub)corpus.