What is a Corpus?

A Corpus is a collection of texts in electronic form (in the case of the spoken language – a transcription of speech), used for linguistic research. A special search engine facilitates work with this corpus. It will aid users in finding words and collocations in context and determine their frequency in the corpus and their original text source. It also enables further processing of the found data (alphabetical classification, etc.). Some corpora can be searched also according to parts of speech.

The Czech National Corpus (CNC) is an academic project focusing on building a large electronic corpus of mainly written Czech. Institute of the Czech National Corpus (ICNC), Faculty of Arts, Charles University in Prague has been in charge of the CNC, its expansion, development and other related activities, particularly those associated with teaching and advancing the field of the corpus linguistics.

Institute

Jazyky