**Table of Contents**

The Splus system provides most of the statistical analysis software that you will need in speech analysis. However, Emu provides some specialist functions which are often used in speech research and are either not present in Splus or are not suited to the large amounts of data common in speech problems.

A common methodology in speech research in evaluating a set of features with respect to their discriminatory power is to carry out a gaussian classification analysis. In a gaussian model, a set of data is characterised by the mean and covariance for each class within the data along a number of dimensions. New data points can then be classified by measuring their distance from each centroid and assigning the class of the closest centroid. Emu provides functions to build a gaussian from a set of multi-dimensional data and for two types of distance measure for classification.

The function `train` takes a vector or matrix
of data representing a number of segments (one segment per row) such
as returned by `track` or
`muspec`, and a parallel label vector. It returns
the class centroid and covariance matrix for each unique label in
the label vector. Obviously, if the dimensionality of the input
data is high this procedure will take some time. For such data it
is often useful to first carry out a data-reduction step such as
principal components analysis or canonical discriminant analysis and
then build the gaussian model. This will be discussed further later
in this chapter.

Once you have a gaussian model, you can use one of two
procedures to classify new data points: Bayesian distance or
Mahalanobis distance. The Bayesian distance measure treats each
centroid and covariance matrix as the specification of a
probability distribution for that class. For each new data point we
calculate the probability that that point came from each class; the
data point is then assigned to the class which gave the highest
probability. To illustrate consider the one-dimensional example
shown in Figure 13.1, “A one dimensional probability distribution for two classes
A and B.”. Here we have two
probability distributions: class A is centered at 5 and has a
narrow distribution while class B is centered at 10 and has a wider
distribution. The y-axis shows the probability density for each
distribution. The point P is intermediate between the two centroids
but we can see that the probability that it was derived from class
B is larger than that from class A. Consequently, this point would
be classified as B. On the other hand, point Q is closer to the
center of A and so has a higher probability in the A distribution
than in that of B; it would be classified as A. The Bayesian
distance measure is similar to the straight line (Euclidean)
distance measure but takes into account the shape of the
probability distribution for each class. This probability
distribution is estimated by the `train` function
which finds the centroid and covariance matrix of the training data
for each class.

The Bayesian distance measure is defined as: Need some maths here!

To classify a set of data using the Bayesian distance measure
use the `bayes.lab` function, which takes two
arguments: a gaussian model as returned by `train`
and a matrix of data with the same dimensionality as that used to
generate the model. As an example we can attempt to distinguish
the vowels [A], [O], and [V] based on the first two formant values
at the midpoint. First we extract the data using
`track` and then we classify using
`train` and `bayes.lab`:

segs <- emu.query("demo", "*", "Phonetic=A|O|V") data <- track(segs, "fm", cut=0.5) labs <- label(segs) model <- train(data[,1:2], labs) blabs <- bayes.lab(data[,1:2], model) confusion(labs, blabs)O V A O 16 0 0 V 0 10 0 A 0 0 15

With two dimensional data we can also visualise the
distribution of data using `eplot`:

eplot(data, labs, dopoints=T, formant=T)

The result is shown in Figure 13.2, “The distribution of [A], [O], and [V] in the F1/F2 plane.”.

An alternative distance measure that is in common use is the Mahalanobis distance. This is similar to the Bayesian distance in that it takes into account the shape of the covariance matrix of the class model. However, the derivation of the Mahalanobis distance formula assumes that the covariance matrices of each class are the same in order to simplify the calculations involved. Thus it is valid to use the Mahalanobis distance measure if the data for each class is similarily distributed, however, nothing prevents you using it if they are not. The Mahalanobis distance is defined as:

The `mahal` function takes a gaussian model
generated by `train` and a matrix of data with the
same dimensionality as that used to build the model, and assigns a
label to each data point. In the following example we classify the
data derived above using the Mahalanobis distance measure:

mlabs <- mahal(data, model) confusion(labs, mlabs)O V A O 16 0 0 V 2 8 0 A 0 0 15

Compare these results with those given by
`bayes.lab` above. Although in this case the
Bayesian distance measure provided better results, this is not
universally the case. The decision as to which distance measure to
use in a given experiment should be based on the shape of the class
distributions; if they are similar then use of Mahalanobis distance
is justified (it performs significantly faster), if they are vastly
different then the Bayesian is more properly used.

In the examples above the same data was used to build the
gaussian model (using `train`) and to evaluate it
(using `mahal` or `bayes.lab`).
This is known as a *closed* test of the model
since the set of data being considered is closed. In a true
*open* test, the test data should be independant
of the training data, for example, from a different set of
speakers. To perform an open test with the functions described
above it is only neccessary to derive two segment lists (for
training and testing segments) and the corresponding track data
from each. The model is then trained using the first set of data
and tested using the second, as in the following example:

train.segs <- emu.track("demo", "msajc*", "Phonetic=A|O|U") train.data <- emu.track(train.segs, "fm", cut=0.5) test.segs <- emu.track("demo", "msadb*", "Phonetic=A|O|U") test.data <- emu.track(train.segs, "fm", cut=0.5) model <- train(train.data[,1:2], label(train.segs)) blabs <- bayeslab( test.data[,1:2], model) # perform open test confusion(label(test.segs), blabs)U O A U 11 0 0 O 0 11 0 A 12 2 7