Testing
VoicexML Applications
(Continued
from Part 1)
Data
Query Logic
The
search algorithm itself was implemented programmatically
using cgi-perl, and the hierarchical database extracted
or 'spidered' on a timer across the network.
Because
the user input was flight number, and 12 data fields
represented the available information to respond with,
only one of the fields needs a strict match algorithm.
The sequencing of the query logic is summarized in steps
for Figure 3.
Create output audio templates
Read database
Search database
Enable holding array to parse previous and next navigation (1 per return)
Read template
Substitute results into template
Write next
|
Figure 3 - Data Query Logic
A
couple of features of this logic are worth noting. First,
since each voice query on flight number calls a unique
database entry (maximum two to three if continuation
flights), the logic would differ only slightly if matching
cities or times, in which case multiple replies might
be included for all the flights to or from, say, Atlanta.
But the need for surgical data extraction, in most time-critical
or wireless applications, argues in favor of reporting
less information as a default.
Secondly,
to simplify the data presentation, a template file includes
the formats unique to VoiceXML (e.g., <audio>
tags). The template includes a commented section labeled
<!--Results--> which become part of the search
match itself and gets substituted for within the template
for each fresh response and after each new flight number
is spoken.
A
final simplification includes the ability to hold a
large number of matched results in temporary storage,
e.g. slice a section of the database, but then enable
the user to specify page-to-page navigation. Based on
whether their questions have been successfully answered,
such response-reply cycles trigger after only the first
couple of results are spoken.
Voice
Experiences Unique to Language Recognition and Machine
Grammars
While
the machine grammars to recognize natural numbers do
provide reliable entries to pass application variables
to a CGI script, some challenges remain depending on
the way that the user decides to speak the flight number.
For instance, to ease the matching between the voice
replies and what might appear on a typical airport status
screen, the three-letter airport code is included with
each city (e.g., Los Angeles = LAX). When the three-letter
airport codes are encountered by the machine grammar
at reply, most VoiceXML applications will attempt to
pronounce rather than spell the codes. This leads to
garbled responses, particularly for codes that include
no vowels (e.g., London = LGW), which get 'sounded out'
rather than spelled.
While
that complication derives from the database field itself,
and can be easily handled by how the field is entered,
a less predictable feature includes the various ways
a traveler might speak the flight number. Since this
number can span between flights beginning at 10 and
ending at 2600, the grammar must recognize odd entries,
such as speaking a zero as the letter "O",
or speaking 26 hundred or 2 thousand 6 hundred. Generally
the natural number grammar style sheet provided for
these multiple options, including single number notations
that might simply speak the lone digits in sequence.
Testing
and validation
Since
most developers in VoiceXML put considerable forethought
into 'human factoring' how a particular application
may be used in practice, the testing can include both
code verification and user surveys. Once the server
application is installed, the major verification tests
depend on what happens when the bounds on the user entries
are exceeded. Another way of building in 'bounded entries'
is to limit queries within an initial error check done
client-side (javascript), and responding appropriately
before unbounded queries reach the main program.
Particular
to voice applications, not only does the rhythm of speech
influence the accuracy of passing variables to other
programs, but also shows up in the parsing grammar itself.
A case in point includes the different ways that a rushed
traveler may choose to speak in large numbers: with
or without single digit pronunciation, or the multiplicity
of tens, hundreds or thousands.
To
validate these grammar style sheets and the basic voice
recognition, a number of different dialects (American,
English, Australian), varied demographics (educational
backgrounds) or professions might become a standard
method for validating voice applications. User feedback,
particularly when the data itself is not stored permanently
but deleted on a relatively frequent basis, can prove
most important to tracking bugs that formerly appeared
but have long since disappeared once the report reaches
the original developer, when the corrupt data itself
has been overwritten. In those cases, the main record
to rely on would be user surveys.
Conclusions,
strategies and experiences
To
handle dynamic data while building a large application
poses challenges in a voice-driven program. The usual
errors for bad data, error checking and out-of-bounds
queries can pose a large matrix of failure points when
the code finally gets deployed. While a combination
of client and server checkpoints will protect the load
on the server, since simple errors get filtered prior
to draining any server CPU, the dependence of success
on user surveys becomes an important part of training
both developers and end-users to achieve good searches.
Experience
to date with the large airline application has shown
that outside of voice recognition itself, attention
to keeping the data fresh, current and stored in the
simplest accessible formats can increase reliability.
Future work can enable more complicated questions to
be asked of the application, particularly as more user
survey data is compiled.
Now
that you've read the article, call +1-800-555-8355 ext.
135802 and try the application yourself.

back
to the top

Copyright
© 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|