Research Profile

Since its foundation in 1994, the Institute of the Czech National Corpus (ÚČNK) has focused mainly on the continuous fostering of the Czech National Corpus (CNC ) project. Its goal is to systematically map the development of the Czech language (written and spoken) in its many forms and genres by creating and making available extensive collections of authentic texts – language corpora – that serve language-oriented research in the field of social sciences and humanities. When building corpora, the main focus of CNC is on their varied and representative composition, high-quality data processing and rich annotation.

An integral part of CNC’s activities is the development of specialized web applications for user work with corpora that is supplemented by user services, such as consultations, workshops, corpus hosting, user data analysis and provision of data packages for specific purposes such as natural language processing. The applications are concentrated on the web portal at http://www.korpus.cz together with online user support: extensive documentation and manuals, advice cent r e, etc.

The CNC operations are divided into four sections:

linguistic section (research and teaching activities, user support)
data acquisition and processing section (collection of language data and their processing)
linguistic analysis and annotation section (morphological and syntactic annotation)
computational section (user application development, IT administration).

In addition to the development of the CNC project, the ICNC mainly deals with scientific and publication activities in the field of corpus linguistics, the development of its methodology, and last but not least, also the education of both master’s and PhD students of the Faculty of Arts.

CNC is oficially recognized as CLARIN K-c entre, actively cooperates with CLARIN ERIC and maintains lively contacts with many foreign research institutions with a similar focus.

Institute

Jazyky