Emu and ToBI

ToBI is a system for transcribing the intonation patterns of spoken language. ToBI defines a number of annotation levels and the criteria for placing labels on each. These labels include a segmentation into words along with tone labels which mark prosodic events such as prosodic tones and phrase boundary events. ToBI annotations have largely been made using the Unix based Waves+ toolkit from Entropic and the example materials are made available in the ESPS format which is readable only by Waves+ (and by Emu when an ESPS licence is available -- ie. on a Unix platform). This page describes some tools for using ToBI annotation in the Emu system.

The ToBI training materials available from the Ohio State web site are in the ESPS format. We have converted these files to the SSFF format read by Emu on both Unix and Windows. These files are now available on our server:

The smaller files consist of only 12 utterances from the database (those beginning with `a').

These packages contain the original label files (augmented with a dummy label at the start of the word level so that Emu can treat the words as segments rather than events). Two Emu template files are provided, one mimics the traditional ToBI annotation scheme which presents four independant tiers, the other adds domination relations and two additional levels for intonational and intermediate phrases. A script is provided to convert traditional flat ToBI annotations into hierarchical ones. An example hierarchical annotation is shown below.

In this scheme, the Tone level is preserved, non-phrasal tones are linked to the word in which they occur and words are grouped into Intonational and Intermediate phrases based on the position of phrase boundary events.

Pitch Tracking on Windows

Researchers investigating prosody have relied on ESPS/Waves+ for both labelling and pitch tracking. Emu can manage the labelling role but does not provide a pitch tracker. There are a number of possible pitch trackers that might be of use.

Fortunately, the ESPS codebase has now been donated to the KTH speech group and is now being integrated into the Snack toolkit. The most recent release of Snack includes the ESPS pitch and formant tracker code and the most recent Emu release contains a simple tool to run these over speech data and produce SSFF formatted data files for your corpus.


For more information, please send mail to Steve.Cassidy@mq.edu.au.

Copyright © 2001, Department of Linguistics, Macquarie University.