Chapter 2. Getting Started with Emu

Table of Contents
Collecting Speech Data
Writing a Database Template
Using the Database

The Emu speech database system provides a flexible set of interfaces for developing and extracting data from speech databases. An Emu speech database consists of a collection of sampled data files each of which has one or more associated label files. This tutorial will work through the creation of a simple database to explain the major features of Emu.

Collecting Speech Data

The first stage in building a database is to collect some sampled speech data. In most cases this is a sampled acoustic waveform but Emu can deal equally well with other data such as recordings of jaw and tongue position or a combination of these. The normal way of organising this data is to store each utterance in a separate file, where an utterance could range from one isolated word to a whole passage. The important criteria in an Emu database is that a hierarchical description can be built for each utterance and hence an utterance should correspond to a whole language unit, be it a word, phrase, sentence or paragraph.

Emu collects together the various files that make up an utterance (for example, acoustic data, derived formant tracks, label files) by giving them the same basename and an extension associated with the file contents. For example, a sampled acoustic data file might be sp01a001.sd and the associated word label file would be sp01a001.wrd. Emu allows these files to be kept in different directories if this is appropriate.

This tutorial will describe how to set up an Emu database using the small example database distributed with the Emu software. This consists of recordings of nine sentences from one Australian Male speaker. The sample database also has phonetic label files for each utterance. We will later show how to label a database from scratch. If you have data of your own, either labelled or not, you should be able to follow the procedure below to make it accessible to Emu.