A
Case for Improved Dialog Traversal (Call Flow) Testing
and Analysis
By Stuart Harding
Synopsis: There needs to be an improved methodology
for evaluating the functionality and performance of
an application's dialog traversal over the current process
of manual calling and reporting. This article presents
a case for comprehensive, consistent, repeatable, automated
testing developed by CoAssure, Inc. When used with VXML
(and other coding technologies) for speech and IVR telephony
applications, improved and accelerated test results,
presented in concise online reports is achieved. Cost
savings can be an added benefit for companies routinely
using the service.
Testing
is an integral part of the design, development, deployment
and support for self-service (voice recognition and
IVR) applications. Such a high percentage of voice telephony
applications are customer facing, that virtually all
can be considered to be mission critical and should,
therefore, be closely scrutinized at several points
in the development cycle. Prior to deployment and any
time hardware or software upgrades are implemented,
a comprehensive analysis of functionality and performance
is a cost effective procedure. After all, the purpose
of self-service applications is to generate costs savings
by reducing the load on call center personnel, receptionists,
etc. and to provide the caller with easy access to information
on a 24/7 basis. Without convenient access to information,
callers opt for live agents, undermining the advantages
and cost savings expected from the system. Even minor
issues with the application result in an unsuccessful
self-service call and/or customer dissatisfaction. Comprehensive
and consistent testing is the only way to uncover issues
with call flow.
Many
companies are comfortable with using outside vendors
for usability testing on VUI design; load testing to
validate system performance prior to deployment is commonly
outsourced; but today, nearly all companies that are
developing/deploying self-service applications use in-house
resources and a cumbersome manual process for validating
the functionality and performance of the dialog traversal.
There are numerous reasons for using in-house resources,
some are good reasons, but others should be examined
more closely.
How
is traversal testing conducted today and exactly what
is tested and measured when conducting dialog traversal
tests? Every company surveyed that is developing self-service
telephony applications claims to conduct a test of the
call flow functionality. Most of these same companies
check prompts, grammars, error conditions, etc for compliance
with the specification. A few companies have developed
rudimentary tools to ensure that every call path can
be exercised. Some companies pass responsibility for
testing the application to the company deploying the
application. Every company employs a methodology for
testing that relies on a person or a group of people
to phone the application and step through call flow
paths. The test caller must dial the number, follow
the call flow procedure, listen carefully to the responses,
and note variances from the specification. This process
may go on for hours, usually days, sometimes weeks.
When executed well, it is, at best, tedious work, prone
to problems - an inexact exercise.
Why
is manual testing (of the dialog traversal) an imprecise
exercise? The reasons are numerous; anyone who has directed
traversal testing will likely have a far more extensive
list than the one provided below.
-
The selection of personnel to test applications
often results in a varied group of testers, either
from day-to-day or week-to-week. Although the individuals
may be qualified to conduct tests, a primary goal
of testing should be to eliminate variables and only
evaluate the effects of changes made to the application
code.
- The call flow paths are not always covered rigorously.
To be certain the application is performing in accordance
with the specification, the tests must exercise every
state and prompt. Where feasible all in-grammar responses
should be tested. Additionally, testing all of the
exception conditions (silence, out-of-grammar, and
inappropriate utterances) to the limits of the spec
should confirm proper error handling throughout the
application. Global commands need to be checked in
every state.
- The beginning of speech delays and barge-in tests
cannot be executed precisely by a human. Applications
perform differently when delays and barge-ins vary;
to precisely evaluate the differences a means of accurate
measurement is required.
- Interpretations of the specification may vary from
person to person. Maintaining a consistent determination
for problem areas is most difficult, especially if
the testing phase is extended across several days
or even weeks.
- Errors detected and reported may not be repeatable.
This is a major problem for any test system. The best
solution is to have a methodology for recording calls
and playing back only those portions of a call where
discrepancies occur.
- The reporting system may be slow or imprecise.
Collecting, consolidating and distributing test results
can be a cumbersome task, especially when multiple
locations are involved.
The
best time to use in-house resources to check functionality
is early in the application development process when
there are many issues; using developers to debug new
code is probably most efficient and a good learning
exercise. As an application grows or more modules are
added, the test process becomes a more time consuming
endeavor. Using developers to place calls is an expensive
way to test, a more objective and cost effective approach
is needed.
There
needs to be an improved way to test self-service applications.
There should be a way to comprehensively and consistently
test the traversal. There needs to be an objective evaluation
and timely report of the progress made by code changes
to the application.
CoAssure
has developed an automated testing methodology that
represents a vast improvement over manual calling. First,
CoAssure reviews the application specification provided
by the customer. Next an XML representation of the application
is created for the purpose of insuring comprehensive
test coverage. The XML code also allows for different
testing criteria to be used based on the desired goal
of the test. For example, early in the process a test
set of calls can be created which does not exercise
all of the error handling conditions. Such a test set
can be executed in a minimum amount of time and quickly
reported. Known deficiencies can be eliminated from
scrutiny, (e.g., if not all global commands have been
instituted across the application, they can be excluded
from testing). Prior to code release, a comprehensive
test set can thoroughly evaluate the application - both
the basic functionality and the performance of the overall
system, with delays accurately measured. At a later
date, traversal testing can be conducted, using the
same test set, during load testing to determine system
performance under those conditions and compare the application
in the unloaded situation.
The XML code will determine all of the in-grammar utterances
that must be pre-recorded for the purpose of the automated
calling. With the traversal test set and prerecorded
utterances, the automated execution of the test calls
is ready to begin. Automated calling progresses much
faster than manual calls. The system dials the application
phone number and steps through the prescribed test call
in the shortest time possible. Where feasible, the system
will enable barge-in to minimize test calling times.
Immediately upon completion of one call, the next call
is initiated; hundreds of calls can be completed in
a day using a single port.
As
the test progresses, discrepancies are noted, assigned
a code number and catalogued (stored) in a database.
Every call is recorded and can later be played in its
entirety, or the user has the option of only listening
to the discrepant portions of a call.
In
addition to addressing the issues of comprehensive,
consistent and repeatable testing, efficient reporting
of the test results has been addressed. Soon after a
test set run has been completed the results will be
available in a password protected location on the Internet.
Many reports are available and have been designed to
meet the needs of developers, QA personnel and program
managers. Developers can go directly to the traversal
discrepancies that have been noted, where they can see
the expected text and listen to the recordings. Higher
level summary reports give a quick overview of application
performance. Still other reports highlight delays in
the application response, length of calls and other
indicators that may point to issues of user satisfaction.

back
to the top

Copyright
© 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|