Chatino Speech Corpus Archive Dataset
No Thumbnail Available
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Date
2016-10-10
Journal Title
Journal ISSN
Volume Title
Publisher
Permanent Link
Abstract
The data is the result of experiments related to the process of creating speech technologies to document a low-resourced or endangered language. The language that we picked for the creation of speech corpora and training of forced alignment tools is Eastern Chatino, an unwritten and low-resourced language from Oaxaca, Mexico. As far as we can tell, this is the first such resource available under a free Creative Commons license.
Description
This zip file contains WAV-audio files and annotations. The recordings were produced using a digital audio recorder (ZOOM H6) and can be listened to using any sound software that can play WAV-audio files. The annotations can be viewed and edited by the ELAN software packages. ELAN (https://tla.mpi.nl/tools/tla-tools/elan/) is a professional tool for the creation of complex annotations of video and audio resources. Download the dataset using the link below.
Keywords
speech corpus, Chatino, GORILLA, ctp
Citation
Journal
Link(s) to data and video for this item
Type
Dataset