Chapter 8. Writing Scripts for Emu

Table of Contents
AutoBuild Scripts
The Emu Script Library

This chapter describes the programming interface for extending the Emu system to automate hierarchy construction and to provide special purpose labelling facilities.

AutoBuild Scripts

Many elements of a hierarchical description of an utterance can be generated automatically. For example, words can be parsed into syllables using the maximum onset principle, vowel targets can be linked to the vowels segments in which they occur and phonetic segments can be mapped by rule onto phonemic segments.

An Emu user is free to write any script to build a hierarchy using all of the facilities of Tcl and the Emu extensions (described in Chapter 7), or even in C++ using the core Emu library. However, Emu provides a standard interface specification for these scripts which enable them to be called from the labeller (if an auto build script is defined, a new button is added to the toolbar in the labeller which runs the AutoBuild procedure) or via the emu_build command line utility.

The build script for a database should be specified in the database template via the AutoBuild variable (see AutoBuild variable description). The script file should load any tcl packages required and define two procedures AutoBuildInit and AutoBuild; the details of these procedures are defined below.

AutoBuildInit is a procedure with one argument, the name of the tcl command corresponding to the database template, which initialises any data structures used by the auto build script. It is called once only when the database template is loaded. The form of the procedure is as follows:
proc AutoBuildInit {template} {
     # commands to initialise data structures
}
Note that all arrays and other global variables used by the script should be initialised here to ensure proper initialisation when running the script inside the Emu labeller.

AutoBuild is a procedure with two arguments, the database template command and the current hierarchy command. The procedure should perform any actions necessary to build the hierarchy, for example creating segments and linking segments on different levels in the hierarchy. The form of the procedure is as follows:
proc AutoBuild {template hierarchy} {
     # commands to add segments and links to the hierarchy 
}

These procedures will typically use the tcl package emu-script (see the section called The Emu Script Library) which provides a number of routines for tasks like syllabification or rule based rewrites which can be applied to various hierarchy building tasks. The autobuild file can contain arbitrary tcl code such as procedure and data structure definitions. It may also load other files or packages as required for hierarchy generation for a particular database.

The following example (Script Listing) defines the AutoBuild procedure for a database of children's speech. The data is labelled at various levels but hand labelling is limited to the Phonetic and Target levels. The script defines a procedure (levels_from_filename) to derive various labels from the utterance filename -- the common practice of using file names to encode speaker and stimulus information in the filename is decoded here and segment labels are generated at the highest level of the hierarchy (the Set level). The AutoBuild procedure calls procedures from the emu-script package to construct segments at the Phoneme level from those at the Phonetic level using a set of rewrite rules (the section called MapLevels). It then generates syllable segments via the maximum onset principle (the section called Syllabify) and links target events to the phonetic segments in which they are temporally contained (the section called LinkFromTimes). All of these procedures are described in detail in the section called The Emu Script Library.

package require emu-script

# levels_from_filename --
#
#   generate the top and word levels by decoding the filename
#   -- filename is eg. sp1:sp1.irk or sp1:sp1.1234 for isolated words
#      or number strings. We want Sex (m/f), Age (numerical), 
#      Set (isolated/numbers/passage) and Speaker (sp1 here) labels at 
#      this level. 
#
# Arguments:
#   templ, a template
#   hier,  a hierarchy
# Results:
#   modifies the hierarchy
#
proc levels_from_filename {templ hier} {
    global spkrinfo

    if [regexp {(sp.):sp.\.([a-z0-9]+)} [$hier basename] match speakerid word] {

	if [regexp {[0-9]+} $word] {
	    set setlabel "numbers"
	} elseif [regexp {h.+d} $word] {
	    set setlabel "hVd"
	} else {
	    set setlabel "isolated"
	}

	set sexlabel [lindex $spkrinfo($speakerid) 0]
	set agelabel [lindex $spkrinfo($speakerid) 1]

        # add a Set level segment
	set newseg [$hier append Set $setlabel] 
	$hier seginfo $newseg label Sex $sexlabel
	$hier seginfo $newseg label Age $agelabel
	$hier seginfo $newseg label Speaker $speakerid

        ## add a word level segment
	set wrdseg [$hier append Word $word]

        ## relate the word level to the set level
	$hier seginfo $newseg children Words $wrdseg

        ## and relate the word level to all syllables
	$hier seginfo $wrdseg children Syllable [$hier segments Syllable]
    }
}

####################################
## AutoBuildInit initialises any rulesets needed
## by AutoBuild
####################################
set build(p2prulesfile) "p2prules.ae"

proc AutoBuildInit {templ} {
    global build spkrinfo bad

    set build(p2prules) [ReadLevelRules $build(p2prulesfile)]

        ## the bad array, consonants and consonant clusters which can't 
    ## begin a syllable 
    ## these are for andosl labelling style, 
    
    set bad(triples) { spl spr spj str stj skl skr skj skw smj }
    set bad(pairs) { 
          pl pr pj tr tj tw kl kr kj kw bl br bj dr dj
          dw gl gr gj gw mj nj lj fl fr fj vj Tr Tj Tw 
          sp st sk sm sn sf sl sj sw Sr hj }
    set bad(singles) { 
          p t k b d g tS dZ f D s S h v z Z m n l r w j T }


    ## sex and age info for each speaker
    array set spkrinfo {
    sp1 {f 9}        
    sp2 {f 9}        
    sp3 {f 9}        
    sp4 {f 11}       
    sp5 {m 7}        
    sp6 {m 10}       
    sp7 {m 7}        
    sp8 {m 7}        
    }    
}

####################################
## AutoBuild builds the hierarchy using whatever
## means are appropriate
####################################
proc AutoBuild {templ hier} {
    global build 
    
    ## only do things if this hierarchy isn't built
    if {[$hier segments Set] == {}} {

        ## map the phonetic level onto the phoneme level
	MapLevels $templ $hier Phoneme Phonetic $build(p2prules)
	
        ## build the syllable level via maximum onset principle
	Syllabify $templ $hier Phoneme Syllable bad
	
        ## link targets into Phonetic
	LinkFromTimes $hier Phonetic Target
	
        ## add higher levels and link everything up
	levels_from_filename $templ $hier

    } 
}