Emu and MySQL
I've recently carried out some experiments involving converting an Emu hierarchical annotation into tables for a relational database. The point of this exercise is twofold. Firstly to look for improvements in the speed of searching large databases for segments. Secondly to demonstrate the equivalence of the Emu hierarchical model to the relational model, which is much better understood. The results of the experiments are summarised in a paper to be presented at Eurospeech99 in Budapest.
Database Software: MySQL
Initial experiments were carried out with the PostgreSQL system which is included with RedHat Linux 5.2. This system proved easy to use but gave very slow response times to queries: around five minutes for the example domination query. Some research turned up some comments about PostgreSQL not being a very fast system and so MySQL was tried as an alternative. Initial results were very encouraging and so far MySQL has shown to be as fast or faster than Emu on all queries tested. MySQL is available for most Unix systems and for Windows systems (although a fee is payable for Windows users).
Scripts are given here which convert an Emu database to ASCII tables and a set of instructions to import the tables into MySQL. Although I have no experience with other database systems I imagine that the scripts could be adopted to import the data into any relational database system (eg. Oracle, Microsoft SQL Server).
The conversion script emu2dbase (a downloadable version is here) is written in Tcl using the Emu extensions. The script takes two parameters, a template name and a pattern matching utterances to be translated into the relational tables. A directory is created named after the template and four files are created in the directory:
The script can be given an flag
emu2dbase children 'sp1:*'
this will convert all utterances matching the pattern
The database, with the same name as the template file, is created by running MySQL with the database file as input:
mysql < database
this will delete any existing database, create a new one and create the appropriate tables for tokens, links and levels. These tables are then populated from the text files. The database can then be queried as per the examples in the paper.
The results with MySQL are impressive, simple queries are very fast and even sequence and domination queries run in less time than the standard Emu query engine. Since the Emu engine searches all utterances sequentially, it will not scale well to very large databases. The SQL engine uses indexes to improve efficiency and should scale well, although further experimentation is required to verify this.
Further developments will be posted here as we are able to do more experiments. Any comments are welcome.
For more information, please send mail to Steve.Cassidy@mq.edu.au.
Copyright © 2001, Department of Linguistics, Macquarie University.