Maarten Janssen

On 12th March 2019, Dr. Maarten Janssen visited our Institute and gave lectures on

TEITOK – a web-based platform for viewing, creating, and editing corpora

In this talk I will give a general overview of TEITOK, an online system for making corpora available and searchable, but at the same time for editing them, annotating, and correcting. In TEITOK, a corpus consists of a collection of heavily annotated, Text-Encoding Initiative (TEI) compliant XML files, each of which can be edited individually. The files can contain not only the corpus text, but also a wide range of annotation data, concerning many aspect of the text, including its relation to sound files or facsimile images. This allows for coordinate-sensitive document descriptions, time-aligned audio transcriptions, or multi-layered transcriptions. I will show how this makes TEITOK a powerful tool for at least the three areas where it is most used: learner corpora, historical corpora, and spoken corpora.

Úvod > Our guests > Maarten Janssen