Emu and MySQL

I've recently carried out some experiments involving converting an Emu hierarchical annotation into tables for a relational database. The point of this exercise is twofold. Firstly to look for improvements in the speed of searching large databases for segments. Secondly to demonstrate the equivalence of the Emu hierarchical model to the relational model, which is much better understood. The results of the experiments are summarised in a paper to be presented at Eurospeech99 in Budapest.

Database Software: MySQL

Initial experiments were carried out with the PostgreSQL system which is included with RedHat Linux 5.2. This system proved easy to use but gave very slow response times to queries: around five minutes for the example domination query. Some research turned up some comments about PostgreSQL not being a very fast system and so MySQL was tried as an alternative. Initial results were very encouraging and so far MySQL has shown to be as fast or faster than Emu on all queries tested. MySQL is available for most Unix systems and for Windows systems (although a fee is payable for Windows users).

Scripts are given here which convert an Emu database to ASCII tables and a set of instructions to import the tables into MySQL. Although I have no experience with other database systems I imagine that the scripts could be adopted to import the data into any relational database system (eg. Oracle, Microsoft SQL Server).

Scripts

The conversion script emu2dbase (a downloadable version is here) is written in Tcl using the Emu extensions. The script takes two parameters, a template name and a pattern matching utterances to be translated into the relational tables. A directory is created named after the template and four files are created in the directory:

  • database contains MySQL instructions to create the database and populate the tables.
  • tokens.db the tokens table, one line for each token, each token has a unique id.
  • links.db the links table, one line for each link between a pair of tokens. Tokens are identified by a unique id taken from the tokens table.
  • levels.db the levels table, one line for each level.

The script can be given an flag -append which appends rows to the existing tables rather than deleting any existing tables. An example of the script usage is:

emu2dbase children 'sp1:*'

this will convert all utterances matching the pattern 'sp1:*' into tables for the relational model, with the results stored in a new directory children.

The database, with the same name as the template file, is created by running MySQL with the database file as input:

mysql < database

this will delete any existing database, create a new one and create the appropriate tables for tokens, links and levels. These tables are then populated from the text files. The database can then be queried as per the examples in the paper.

Results

The results with MySQL are impressive, simple queries are very fast and even sequence and domination queries run in less time than the standard Emu query engine. Since the Emu engine searches all utterances sequentially, it will not scale well to very large databases. The SQL engine uses indexes to improve efficiency and should scale well, although further experimentation is required to verify this.

Further developments will be posted here as we are able to do more experiments. Any comments are welcome.


For more information, please send mail to Steve.Cassidy@mq.edu.au.

Copyright © 2001, Department of Linguistics, Macquarie University.