VoiceXML Review - Feature Articles

This article describes the VoiceXML application for a commercial Airlines flight tracker that delivers voice query searches of current flight status for 2000 in-air flights per day, using the TellMe Networks deployment environment, perl, javascript and server-side search of dynamic data inputs. General principles of usability and programming are emphasized.

Performing search queries using voice recognition presents a host of design questions, ranging from optimal database design, interface and voice grammar choices, and data sources for dynamic inputs. Because of the frequently changing status reports on in-flight information, the challenges of maintenance and frequency of updates make for a simple state diagram to query airline status round-the-clock.

A typical large commercial airline may include upwards of 2500 flights daily. The voice user needs to access that data prior to departing to the airport or en route using wireless toll-free inquiries. The flight connections may include single (non-stop) point-to-point routing, or more complex connections that span multiple cities, times, and departure gates. These intermediate steps bear greatly on the database design, since specifying an arrival gate will likely not be of interest if a holdover gate, but critical information if the flyer is being met at the final stop.

To simplify the large number of choices, the first testing interface was chosen as allowing queries for flight number. Well-established voice recognition grammars thus will include the natural numbers between zero and ten thousand, but with the numbers above the ceiling of available in-flight data unused above 2600. A typical user experience is shown in Figure 1, as the state diagram for sequencing what finally will be a single application variable passed to a cgi-perl script that handles the database query and returns the answers.

Between the two standard choices for database design--either relational or hierarchical--the speed of data refreshes and the high 'throw-away' turnover has some bearing on the choice. If under 3000 to 4000 text entries, hierarchical or flat ASCII text databases can be quickly swapped on any server and overwritten without any compilation or programming other than the file transfer protocols. As a low-maintenance choice, a hierarchical structure was selected initially, and the data listings include 2600 flights with 12 fields.

The number of fields were chosen to include departure, in-flight status and arrival information. Since replies stemming from a voice command on flight number might include mainly departure times and places for the traveler, but an arrival time and place for those meeting the traveler, the entire 12 fields will be read upon a valid match. The metadata choices are shown in Figure 2.

Departure Time (date, time)
Flight Number (0..2600)
Departure Flag (non-stop, continuation or changeover flight =0,1)
Departure City (city, 3 letter airport code)
Departure Status (delayed, on time, cancelled)
Arrival City
Arrival Time
Arrival Status (delayed, on schedule, cancelled)
Departure Gate
Arrival Gate
Aircraft Type (e.g. 707)
Final status (delayed, on schedule, cancelled, at departure
   gate, at arrival gate, in flight, arrived)

From the traveler's perspective, the status of the flight as either delayed, on time or cancelled is their top priority, followed by the departure time and gate. If relaying this information to another party, such as an office or transportation connection, the arrival status, time and gate provides an important convenience for arranging any final success.

Aircraft type (e.g., 747 or DC-10) can prove relevant for some seating arrangements. The departure flag itself, as either a non-stop (0), continuation or changeover flight (1), determines whether the search attempts to return the full routing information for that flight, from departure to final arrival. In the cases where a continuation option is enabled, a two-part departure and arrival schedule should be read back, and this branch will get read in the final status summary for what the expected 'on schedule' responses might offer.

Because of the rapid data turnover needed to identify in-flight status, the data was stored from a batch robotic reading of the various airline databases. Once every several hours, the database is entirely replaced by sequentially reading through the flight numbers iteratively, then storing that file until the next refresh.

To involve less unused data in any given refresh cycle, an alternative option is to extract data as it is requested, mainly through submission of the application variables to various meta-databases scattered around the network, then parsing those returned pages in an appropriate meta-language to play back audio. This option was rejected only because the database of delayed and cancelled flight was classified as the key specification for information returns, so global queries could play back the current status of the whole airline schedule. Selective extraction would likely not deliver that summary in-flight report without iteratively aggregating all the current take-offs and landing reports in total at one snapshot.

VoiceXML Experience Report: Flight Tracker Voice Application

By Dr. David Noever