VoiceXML Review - Feature Articles

Volume 1, Issue 8 - August/September 2001

Testing VoicexML Applications

By Dr. David Noever

(Continued from Part 1)

Data Query Logic

The search algorithm itself was implemented programmatically using cgi-perl, and the hierarchical database extracted or 'spidered' on a timer across the network.

Because the user input was flight number, and 12 data fields represented the available information to respond with, only one of the fields needs a strict match algorithm. The sequencing of the query logic is summarized in steps for Figure 3.

Create output audio templates
Read database
Search database
Enable holding array to parse previous and next navigation
   (1 per return)
Read template
Substitute results into template
Write next

Figure 3 - Data Query Logic

A couple of features of this logic are worth noting. First, since each voice query on flight number calls a unique database entry (maximum two to three if continuation flights), the logic would differ only slightly if matching cities or times, in which case multiple replies might be included for all the flights to or from, say, Atlanta. But the need for surgical data extraction, in most time-critical or wireless applications, argues in favor of reporting less information as a default.

Secondly, to simplify the data presentation, a template file includes the formats unique to VoiceXML (e.g., <audio> tags). The template includes a commented section labeled  which become part of the search match itself and gets substituted for within the template for each fresh response and after each new flight number is spoken.

A final simplification includes the ability to hold a large number of matched results in temporary storage, e.g. slice a section of the database, but then enable the user to specify page-to-page navigation. Based on whether their questions have been successfully answered, such response-reply cycles trigger after only the first couple of results are spoken.

Voice Experiences Unique to Language Recognition and Machine Grammars

While the machine grammars to recognize natural numbers do provide reliable entries to pass application variables to a CGI script, some challenges remain depending on the way that the user decides to speak the flight number. For instance, to ease the matching between the voice replies and what might appear on a typical airport status screen, the three-letter airport code is included with each city (e.g., Los Angeles = LAX). When the three-letter airport codes are encountered by the machine grammar at reply, most VoiceXML applications will attempt to pronounce rather than spell the codes. This leads to garbled responses, particularly for codes that include no vowels (e.g., London = LGW), which get 'sounded out' rather than spelled.

While that complication derives from the database field itself, and can be easily handled by how the field is entered, a less predictable feature includes the various ways a traveler might speak the flight number. Since this number can span between flights beginning at 10 and ending at 2600, the grammar must recognize odd entries, such as speaking a zero as the letter "O", or speaking 26 hundred or 2 thousand 6 hundred. Generally the natural number grammar style sheet provided for these multiple options, including single number notations that might simply speak the lone digits in sequence.

Testing and validation

Since most developers in VoiceXML put considerable forethought into 'human factoring' how a particular application may be used in practice, the testing can include both code verification and user surveys. Once the server application is installed, the major verification tests depend on what happens when the bounds on the user entries are exceeded. Another way of building in 'bounded entries' is to limit queries within an initial error check done client-side (javascript), and responding appropriately before unbounded queries reach the main program.

Particular to voice applications, not only does the rhythm of speech influence the accuracy of passing variables to other programs, but also shows up in the parsing grammar itself. A case in point includes the different ways that a rushed traveler may choose to speak in large numbers: with or without single digit pronunciation, or the multiplicity of tens, hundreds or thousands.

To validate these grammar style sheets and the basic voice recognition, a number of different dialects (American, English, Australian), varied demographics (educational backgrounds) or professions might become a standard method for validating voice applications. User feedback, particularly when the data itself is not stored permanently but deleted on a relatively frequent basis, can prove most important to tracking bugs that formerly appeared but have long since disappeared once the report reaches the original developer, when the corrupt data itself has been overwritten. In those cases, the main record to rely on would be user surveys.

Conclusions, strategies and experiences

To handle dynamic data while building a large application poses challenges in a voice-driven program. The usual errors for bad data, error checking and out-of-bounds queries can pose a large matrix of failure points when the code finally gets deployed. While a combination of client and server checkpoints will protect the load on the server, since simple errors get filtered prior to draining any server CPU, the dependence of success on user surveys becomes an important part of training both developers and end-users to achieve good searches.

Experience to date with the large airline application has shown that outside of voice recognition itself, attention to keeping the data fresh, current and stored in the simplest accessible formats can increase reliability. Future work can enable more complicated questions to be asked of the application, particularly as more user survey data is compiled.

Now that you've read the article, call ext. 135802 and try the application yourself.

back to the top

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).