Chatino Speech Corpus Archive Dataset

No Thumbnail Available
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.

Date

2016-10-10

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The data is the result of experiments related to the process of creating speech technologies to document a low-resourced or endangered language. The language that we picked for the creation of speech corpora and training of forced alignment tools is Eastern Chatino, an unwritten and low-resourced language from Oaxaca, Mexico. As far as we can tell, this is the first such resource available under a free Creative Commons license.

Description

This zip file contains WAV-audio files and annotations. The recordings were produced using a digital audio recorder (ZOOM H6) and can be listened to using any sound software that can play WAV-audio files. The annotations can be viewed and edited by the ELAN software packages. ELAN (https://tla.mpi.nl/tools/tla-tools/elan/) is a professional tool for the creation of complex annotations of video and audio resources. Download the dataset using the link below.

Keywords

speech corpus, Chatino, GORILLA, ctp

Citation

Journal

Type

Dataset

Collections