Chapter 3. The Emu Graphical Tools

Table of Contents

The Emu Labeller
The Emu Query Tool
The Emu Capture Tool
Speech Signal Analysis with Tkassp
Pitch and Formant Tool
The Emu Segmentation Tool

Emu provides a rich set of tools for constructing and using speech databases. In addition to the command line tools mentioned earlier there is a family of tools with a graphical interface which can be used to view and label utterances and to query the database. This section gives a brief view of the main features of these tools.

The Emu Labeller

The Emu labeller supports the display of the speech data and allows annotation of the data with segmental and event labels at any level of detail, and with hierarchical descriptions of the utterance as a whole.

The labeller can display any time waveform stored in a compatible file format, as well as spectrograms derived from the acoustic signal. Formant tracks can be overlaid on the spectrogram if they are available (there is no formant or pitch tracker included in Emu so third party software, such as ESPS from Entropic, must be used); formant tracks can be edited by drawing on the spectrogram. The labeller also displays one or more sets of labels time-aligned with the time waveforms and allows editing of these labels. Since Emu supports hierarchical annotation of utterances, a hierarchical display can also be generated, allowing simultaneous annotation at different levels of description. An automatic hierarchy building procedure can be supplied (eg one which syllabifies words and maps a phonetic level annotation to a phonemic level one) and it can be run from the toolbar while labelling is in progress.

The Emulabel Interface

New databases or utterances and can be selected via the File menu. The database selection dialog presents a list of the database template files available to the system (those on the search path). When one of these is selected the template file is loaded and the labeller is reset to its initial state. A new utterance can be loaded via the Open... item in the File menu. By default, this dialog displays a list of all utterances in the database. Double clicking on an utterance name or selecting a name and clicking on Edit loads the utterance into the labeller. The utterance dialog stays open by default to make working through a sequence of utterances easier. You can remove the window by clicking on Dismiss.

The Signal View

Figure 3.1. The Emu Labeller

The Emu Labeller

The figure above shows an annotated diagram of the signal view in the Emu labeller. This display consists of one or more time-series graphs of speech parameters (for example, acoustic signal, spectrogram, formants, jaw displacement). The choice of parameters displayed is specified by the LabelTracks variable in the database template. Above the signals, a labelling panel displays those levels which have been associated with external label files via labfile statements in the database template -- only these levels have segment/event times associated directly with them. Segment boundaries and events are marked with small triangles which can be moved with the mouse. Event labels appear above the triangle marker, segment labels appear midway between the start and end time markers. The signal displays, including the spectrogram, are overlaid with vertical marks corresponding to the segment boundaries or event times marked in the label window.

Marking New Segments or Events

New segments or events are created by clicking the mouse on the labelling strip background. This action places a mark at the selected location. The interpretation of the mark depends on the kind of level: segment or event. For event levels the new mark becomes a new event and a blank label is displayed as an asterisk above the marker. For segment levels the first mark placed in a labelling session becomes the start marker of the first segment while subsequent markers become segment end markers. Blank labels are displayed midway between start and end markers for each segment.

Manipulating the Signal Graphs

Signal displays are general time series graphs and can be used to display any parameter file readable by the Emu system. In addition, a spectrogram can be generated from the acoustic signal and drawn with formants overlaid if they are available. The template variable LabelTracks defines which tracks will be displayed for each database. This list can be modified during a labelling session via the Tracks... option in the Display menu.

The intention of the Emu signal displays is that as far as possible the different graphs remain time aligned with one another and with the labelling strip. Hence scrolling one graph will also scroll all others. However, it is possible to zoom in on one graph alone and in this case scroll-ganging attempts to align the start times of each graph.

A region of the signal can be selected by clicking, dragging and releasing the mouse[2]. Once the region is selected it can be played (via the play button at the left of the graph) or expanded to the full width of the display (via the zoom between marks button). If a spectrogram is being displayed, a display of the duration of the current region is maintained along with the current time and frequency readouts.

The scale of any graph can be changed via the zoom buttons at the side of the graph. The "Zoom x2" button doubles the time scale each time it is pressed. The "Zoom out" button returns the time scale to its original value -- if a spectrogram is present this should correspond to the spectrogram scale. The "Zoom between marks" button expands the current region to the full width of the display. All of these functions are also available on a pop-up menu accessed by clicking the right mouse button on any graph.

Modifying the Spectrogram

The spectrogram display is calculated from the samples track and supports overlaid formant tracks which can be modified by drawing on the graph. The initial parameters of the spectrogram (bandwidth and frequency range) are specified in the database template (via the LabelTracks variable). The bandwidth and some display parameters can be modified during a labelling session.

The "Modify Spectrogram" button generates a dialog box which allows the user to specify a new bandwidth for the spectrogram. Bandwidths are specified in Hz. The default wideband spectrogram has a bandwidth of 300Hz while the narrowband spectrogram has a bandwidth of 50Hz. The new spectrogram is calculated and displayed in place of the previous version. In addition this dialog box allows the user to modify two parameters affecting the display of the spectrogram: white level and scale factor. The white level is the dB level (although these are not calibrated decibels) which corresponds to white in the spectrogram. Anything below this level will not be displayed. The default value for white level is 35. The scale factor is a multiplier which affects the contrast in the spectrogram. Larger values produce `darker' spectrograms. The default scale factor is 4.

At some future time it may be possible to interactively modify spectrogram contrast and white level. Technical problems in the current implementation prevent this at the moment.

If formants are present for this database (currently these must have the track name fm which is the name used by the ESPS formant tracker) they will be displayed overlaid on the spectrogram. The user is also able to modify an existing formant track by first selecting a formant `pen' by pressing the appropriately coloured button on the left of the spectrogram, and then drawing the new formant track in with the mouse. When the pen is put down, by pressing the same button again, or when the utterance is saved, the user is asked whether the new tracks should be saved. At present only formants in SSFF or ESPS format (with the Entropic ESPS libraries) can be rewritten in this way.

The Hierarchy View

Figure 3.2. The Hierarchical Label Display

The Hierarchical Label Display

This figure shows the hierarchy view in the Emu labeller. This view displays segments and events at different levels and the relationships between them, and allows creation and editing of segments and inter-segment relationships. Segment and event labels are laid out in a tree like structure without reference to the times of the segments.

Each level in the hierarchy may have more than one kind of label (or labeltype, see the section called “Label Definitions”). The hierarchy display will show all labeltypes for a level. For example, if the Word level has additional labeltypes for PartofSpeech and Stress, three labels will be shown for each segment in the display.

The Hierarchy Display

An Emu utterance is a collection of nodes at various levels with links between them. The hierarchy display shows these levels and links in a graphical form and tries to lay the segments out in a natural way. Within a level, the segments are shown in sequence from left to right. Where possible, if a single node dominates a group of nodes at a lower level, it will be drawn centered above the lower nodes with lines representing the links. Since Emu allows fairly complex structures to be built (despite the name, they are not necessarily hierarchical since many-to-many links are allowed) it may not be possible to lay every utterance out in a pleasing way. The labeller attempts to at least draw every segment and the relationships between them even if this results in a very wide display. If you have problems with the labeller drawing a particular utterance you should report it to the Emu mailing list so that it can be addressed in a future version of the software.

The levels shown in the hierarchy display are determined by the template variable HierarchyViewLevels. This should be an ordered list of the levels to be displayed from the top to the bottom of the display. Normally, neighbouring levels would be related in the template file but this is not a requirement. The levels displayed can be modified during a labelling session via the Hierarchy Levels... item in the Display menu.

Creating Segments

New segments can be created at any level in an Emu hierarchy by clicking the mouse on the background of the hierarchical display. The level at which the new segment is created depends upon the vertical position on the display -- a triangular cursor points to the name of the current level at the left side of the display. Segments are created in the appropriate sequential position at a given level -- to insert a new segment between two existing segments, click the mouse between them, to append a segment to the end of a level, click after the last segment.

When a new segment is created it has no label and is displayed as one or more asterisks (if there are a number of labeltypes for a level then each labeltype has one asterisk). A segment label can be entered either by clicking on the asterisk and typing or by clicking the right mouse button and selecting a label from a menu. The menu is derived from the legal label statements in the database template. No check is made that the label is legal according to the template file which means that arbitrary labels can be entered if desired. An existing label can also be modified either by clicking the mouse somewhere in the label text and adding/deleting text, or via the menu on the right mouse button.

The right mouse button menu also provides options for deleting a segment, playing a segment or editing the segment labels via a popup dialog box.

Edit/Delete Mode

The labeller toolbar contains two buttons labelled Edit and Delete. These toggle between edit and delete mode in the labeller. Edit mode is for creating and editing segments and links, delete mode is for deleting segments and links. In delete mode the segment or link under the mouse is highlighted in red and will be deleted if the left mouse button is pressed.

The emulabel-init File

Some customisations of the Emu labeller are matters of personal taste rather than relating to database design. The intention is that an individual user will be able to modify various interface options to make the labeller conform to their own expectations and preferences. At the moment this is achieved to a limited extent via a single configuration file called .emulabel-init (note the initial dot, this means that on Unix the file is `invisible' -- ie it doesn't show up by default in directory listings). The Emu labeller looks for this file in the users home directory and uses it if it is found. The file contains settings for various internal variables which modify the behavior of the labeller.

At present the only configurable item is the region selecting action -- whether to click and drag on a graph to select a region or to click and click again. To choose click-click the init file should contain the single line:

set emu(regionmode) "click-click"

In the future other parameters will be made configurable and a friendlier interface to user preferences will be provided.

[2] This is the default binding, if you prefer to click once for the left boundary and again for the right boundary, you need to set emu(regionmode) to "click-click" in your emulabel-init file.