multimodal oral corpora administration

[moca] is an online sytem for the administration of spoken language corpora. Audio and/or video recordings and the accompanying transcription files are stored in [moca]. Transcription files are aligned, providing speaker information and the temporal blueprint of the transcription in addition to the transcription itself. This allows for accessing the media file at individual points in a transcription file directly through an internet browser. In addition to transcript administration, [moca] allows for the structured administration of sociolinguistic metadata, including information about the setting of a communicative event and about the individual speakers within the event. In addition, manual tagging with so-called labels allows for the collection and detailed analysis of linguistic phenomena.

Detailed search routines enable fine-grained searches for individual events, speakers, transcript excerpts and labels. Searches can be limited, for example, to events and transcripts from certain regions or certain speaker age-groups. In addition, transcripts can be searched for intonation phrases that contain certain (combinations or parts of) word forms.

[moca] aims to provide intuitive, safe and personalized access to spoken language corpora. The system supports a theoretically unlimited number of users whose access to the corpus can be restricted and/or adapted to their individual needs. [moca ] can be used from any web-enabled computer and does not require any additional software or programming skills.

Data types

Structure of the information in moca3.


Currently offline. Please contact Daniel Alcón for more information.


Please write an Email to Daniel Alcón giving some details about the use of the system and your academic affiliation.


Available in short.

Daniel Alcón López | Romanisches Seminar | Albert-Ludwigs-Universität Freiburg