Maarten Janssen

Dne 12. 3. 2019 nás navštívil  dr. Maarten Janssen, který působí na nejstarší portugalské univerzitě. Představil nám nástroj na práci s korpusy, resp. s daty připravenými pro korpusové zpracování, nazvaný Teitok.

TEITOK – a web-based platform for viewing, creating, and editing corpora

In this talk I will give a general overview of TEITOK, an online system for making corpora available and searchable, but at the same time for editing them, annotating, and correcting. In TEITOK, a corpus consists of a collection of heavily annotated, Text-Encoding Initiative (TEI) compliant XML files, each of which can be edited individually. The files can contain not only the corpus text, but also a wide range of annotation data, concerning many aspect of the text, including its relation to sound files or facsimile images. This allows for coordinate-sensitive document descriptions, time-aligned audio transcriptions, or multi-layered transcriptions. I will show how this makes TEITOK a powerful tool for at least the three areas where it is most used: learner corpora, historical corpora, and spoken corpora.

