Chatino Speech Corpus Archive Dataset

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The data is the result of experiments related to the process of creating speech technologies to document a low-resourced or endangered language. The language that we picked for the creation of speech corpora and training of forced alignment tools is Eastern Chatino, an unwritten and low-resourced language from Oaxaca, Mexico. As far as we can tell, this is the first such resource available under a free Creative Commons license.

Series and Number:

EducationalLevel:

Is Based On:

Target Name:

Teaches:

Table of Contents

Description

This zip file contains WAV-audio files and annotations. The recordings were produced using a digital audio recorder (ZOOM H6) and can be listened to using any sound software that can play WAV-audio files. The annotations can be viewed and edited by the ELAN software packages. ELAN (https://tla.mpi.nl/tools/tla-tools/elan/) is a professional tool for the creation of complex annotations of video and audio resources. Download the dataset using the link below.

Keywords

speech corpus, Chatino, GORILLA, ctp

Citation

Journal

Relation

Rights

This data is licensed for reuse under a Creative Commons Attribution Share-Alike 4.0 International (CC BY-SA 4.0) license.

Collections