|
Future Emu Development PlansThis document attempts to set out our plans for future development of the Emu Speech Database System. Current Version 1.xRelease 1.6, July 2001Double click template fiies to open database.This has some consequences since now Emu will accept a template file name as a database name as well as the basename of the template. Templates need not be in the standard directory and so can be distributed on CDROM with a database. Other changes needed are to make the paths for AutoBuild and EmulabelModules relative to the template file; in fact, the labeller will now look in a few obvious places for these files if they are given as relative paths. Template editorA simple text editor built in to the Emu labeller. On Windows this can be invoked via the 'edit' option after right clicking on a template file on the desktop. Dialogue Annotation and TranscriberSome features have been added to Emu to make annotation of dialogue feasable. One such feature is an active link between Transcriber and Emu. Another is a plugin for Transcriber which allows export of annotations in Emu HLB format. A new module for the labeller has been added which shows, eg. turn level labels in a list and loads speech data only for one turn at a time. In this way, long utterances which have already been segmented with Transcriber can be annotated in detail with the Emu Labeller. Future 1.x releasesThere may be some future releases in this series if there is demand for a particular feature of if a serious bug is found. Currently the following items are possible contenders for features to be added here. Macintosh portWe are very close to having a macintosh version of Emu. A version 1.x release will be made for the Macintosh. Cut & paste in labellerTo be able to cut and past label text in the labeller. Emu Version 2.xThe next version of Emu will have a radically different internal organisation. The new design incorporates lessons we have learned in the deployment of Emu 1.x and integrates the work of a number of different annotation projects. Emu 2.x is part of an effort to establish standards by which annotation tools can inter-operate. It does this on two levels: firstly by making use of the Annotation Graph library from the LDC we support a standard data model for annotations and make use of shared file input/output code for different kinds of annotations. Secondly, the new Emu labeller is based on a new modular architecture which has a public interface; the outcome of this is that Emu can use components of other annotation system where appropriate. Another major change in the architecture of Emu is the separation of data file handling and signal processing code from the Emu database core. Emu has never done much signal processing itself but it has included code to read SSFF and WAV format data files. The new version of Emu will make use of the Edinburgh Speech Tools library (C++) and/or the Snack library (Tcl) to read, play and process speech data files. Annotation Graph based internal representation.Steven Bird's group at LDC has developed an Annotation Graph library written in C++ which can be linked in to other applications. The next version of Emu will use this library as the core internal representation of annotations. This library has a Tcl interface and so can be used immediately to build annotation tools as part of the Emu system. Tasks
Modular Labeller ArchitectureThe replacement for the Emu labeller will be made from components which are responsible for displaying one kind of signal or one kind of view on the annotation. Features of the new labeller will include:
All views on an annotation should if possible allow in-place editing of the annotation. TasksThe following components need to be developed:
XML Based Database TemplatesEmu version 2 uses a new extended database template format based on XML. This template duplicates all of the information stored in current Emu template files but is designed to be more easily extended and used in other tools. Important features of the new template file system are as follows:
Tasks
Database QueryThe current query language in Emu is sufficient for many needs but doesn't have sufficient power to properly query annotation graphs. With the new AG core library we will lose the current query language implementation and so a new query system will be needed. This is not a simple undertaking as the design of a new QL has many complex considerations. As a stopgap we might consider implementing a variation on the current QL on the annotation graph system. This would at least allow a useful range of queries to make the system useful in the short term. In the longer term a new QL proposal needs to be made and implemented. Tasks
Speech Data HandlingThe current Emu system does very little signal processing and relies on third party packages to perform formant and pitch tracking etc. This is a significant problem since there is little integration between signal processing software and Emu database functions. Data file input has been handled by some Emu-only routines (SSFF and WAV formats) and by third part libraries (ESPS for the Entropic sd and fea formats, NIST Sphere for the NIST format and the Edinburgh Speech Toolkit for a variety of other formats). Emu has never had a good interface for writing speech data files of any kind. There are two areas where Emu makes use of data file input: display of signals in the labeller and data extraction for analysis and signal processing. The requirements for both of these areas will be discussed here. Signal DisplayComponents of the Emu labeller need to be able to display all or part of a speech data file aligned with the annotation. These data files can be speech waveforms, physiological data, formants, pitch traces or any other time series data either derived from a speech waveform or captured separately. This data is stored in a number of file formats (wav, sd, au, Entropic FEA, SSFF etc.), an important capability is to be able to add support for a new file relatively easily. For signal display, the requirement is to be able to read data from these many file formats into memory and then pass the data to the appropriate visualisation routines. We already have a good times series display widget for Tcl (padgraph) which offers all of the features needed by Emu and supports a C interface for adding data points to the display. The only new work needed here is to improve the file input/output interface. Another option for signal display is to make use of the Snack toolkit. Snack is a Tcl extension package which handles input and output of speech signal data in various file formats and supports a number of signal processing operations. Snack provides a very good platform independant (unix, Windows and Macintosh) audio recording and playback facility. One issue with Snack is that it only supports reading and writing of sampled speech data and not other data such as pitch or formant tracks. As such it is very useful for display of speech signals (includeing spectrograms as it includes a very flexible spectrogram display system) and for writing waveform editing and manipulation programs. My current feeling is that we should make use of Snack where appropriate to write utility scripts and display modules for Emu but not make it a central part of the Emu system. Signal Processing and AnalysisIn the current Emu system, data can be extracted for each segment in a segment list (the result of a database query) and is generally written to a text file or imported into the Splus/R environment for further processing. In some cases C++ code has been written to perform some signal processing on raw speech data corresponding to segments. The idea of performing signal processing operations on the results of queries is very powerful and support for it should be extended in the new system. For example, one could query the database for vowels and then calculate a series of spectral calculations (eg. ERB weighted spectra) on each vowel, storing the data for later analysis. To do this will require a high level interface to a signal processing library. The Edinburgh Speech Tools library provides most of the facilities that might be used in this kind of application. It supports input and output of data files in many formats, resampling of time series data, windowing of data and many signal processing operations. The library is well written and modular and would be easily extended if, for example, we wanted to add a formant tracker or other signal processing operations to the library. In order to make writing new signal processing programs easy, a scripting interface to the Edinburgh Speech Tools library is needed. The obvious choice here is to interface to Tcl and this should be a first step. This would enable Tcl scripts to be written to do complex signal processing operations and for GUI interfaces to be constructed so that common operations (eg. doing one of a set of DSP operations on each segment in a query result) could be presented to users in a relatively simple way. I envisage the Tcl interface enabling scripts such as:
set segments [$dbase query "Phonetic=vowel"]
set dft_data [emu_data -format <some kind of format specification>]
for {set i 0} {$i < [llength $segments]} {incr i} {
## retrieve sampled speech data for this segment
set data [emu_get_data [lindex $segments $i] "samples"]
## calculate DFT
set dft [$data process -operation dft -window hamming -ncoeffs 5]
## append dft data to a new file
$dft_data append $dft
}
## write out the dft data
$dft_data write "newdata.dft"
Another option is to look at a direct interface between ESTools and Splus/R so that Splus scripts can read data directly from speech data files and perform DSP operations on the data before importing it into Splus/R for analysis and visualisation. We would need to study the low level interfaces between Splus/R and C/C++ to see if this was feasable and/or desireable. Tasks
Splus/R InterfaceEmu currently provides a library for Splus or R which includes an interface to database query and data extraction as well as a large collection of functions for data analysis and visualisation. The majority of this code will be retained in the next version but we should look at better integration between Splus/R and the Emu core and at any extensions to the library that might be appropriate. Tasks
New User Level ApplicationsIn the current version of Emu the user level applications are the labeller (emulabel) and Splus or R. Some special purpose applications have been written such as the segmentation tool and the speech capture tool but these have not been well documented or made widely available to the user community. As the functionality in Emu is extended, we forsee a need for a new set of applications using the core Emu components. Some of these are outlined here. Tasks
|
For more information, please send mail to Steve.Cassidy@mq.edu.au.
Copyright © 2001, Department of Linguistics, Macquarie University.