Standardizing
VoiceXML Generation Tools
By David L. Thomson,
Introduction
An area where we have an opportunity to make VoiceXML
easier to use and more portable is in development and
runtime tools. VoiceXML provides two significant advantages
in authoring speech-enabled applications, when compared
to previous methods. It allows a developer to build
speech services with less effort and it allows applications
written for one speech platform to run on another speech
platform. These advantages are diminished, however,
if software tools used to create and support VoiceXML
code are inadequate or incompatible. The VoiceXML Tools
Committee, under the direction of the VoiceXML Forum,
has been working on methods for improving the quality
and uniformity of tools as described below.
To
define a process for improvement, we must first outline
an architecture that illustrates how tools are connected.
Companies currently building tools include application
developers, speech server suppliers, speech engine vendors,
speech hosting service bureaus, stand-alone tool developers,
and customers. An informal survey of commercial tools
suggests that the description illustrated in Figures
1-3, describes most VoiceXML toolsets currently available.
While the interfaces are often proprietary, vary from
system to system, and not all modules are available
from every vendor, most products fit this general framework.
Figure
1 shows a complete VoiceXML development system divided
into three parts, an application development environment,
a VoiceXML page server, and a VoiceXML gateway. Development
tools may include a grammar compiler; a call flow editor
that could be table-based, GUI-based, wizard-based,
or script-based; a waveform editor; an expert system
that assists the developer in making wise service design
choices; error checking routines, and finite-state and
n-gram grammar generation software The output of the
development environment may be VoiceXML code or a representation
of the service call flow in a form that is later converted
to VoiceXML pages by the VoiceXML server.

During
runtime, the VoiceXML page server provides VoiceXML
pages to the gateway in response to user input and other
events. The VoiceXML gateway executes VoiceXML code
and uses text-to-speech and speech recognition software
to communicate with callers. The VoiceXML gateway, and
the associated speech recognition and synthesis software,
the VoiceXML interpreter, and related software, lies
largely out of the scope of the VoiceXML tools effort
and is treated only lightly in this paper.
Figure
2 shows a detailed view of the application development
tools block. The call flow designer helps the developer
write an application, either via a text editor or a
GUI (graphical user interface). It uses a grammar builder
that creates grammar structures for use by a speech
recognizer. In addition, it may support pre-built high-level
scripting objects that encode common user interactions.

Another feature of the application development tools
block is service analysis and testing. Data collected
from the server and the gateway during development,
trials, and live service is used to iteratively improve
the application. This information is an example of runtime
data created during operation for which few standards
currently exist.
An
important characteristic of a service creation environment
is the form of its output. While simple applications
may be written directly in VoiceXML, many services (particularly
complex services) are written in an intermediate form
such as Java, C++, ASP, proprietary scripts, XML, etc.,
and converted to VoiceXML by the VoiceXML page server.
For our purposes, we call the intermediate form meta
code, written in a given meta language. The meta code
created by the development tools specifies the call
flow (behavior of the system in response to a caller)
and is used by the VoiceXML page server.
Figure 3 is a detailed view of the VoiceXML page server.
It receives the service description represented in meta
code and generates VoiceXML pages (and accepts corresponding
signals from the VoiceXML gateway) during runtime. The
process is controlled by a conversation manager, which
may be a state machine or other similar software. The
conversation manager may have access to customer, service,
and other data. It may also access external systems
such as e-mail, instant messaging, web pages, and even
live agents when and if necessary.

The
page server may generate runtime data related to caller
actions, system parameters, or external information.
This data, plus additional data received from the speech
server, is stored in a logging database for use by billing,
OAM&P, service analysis, and for other purposes.
There are many points in the tools domain that might
benefit from standardization. We might define standards
for all interfaces between tools. We might set up an
open source network for developing tools. With finite
resources, we must be realistic and chose those areas
where we expect to reap the greatest benefit. The Tools
Committee has identified two topics of particular interest,
runtime data and the meta language. We treat each separately
in the following two sections.
Runtime Data
In a live service, data is generated that is not represented in the
VoiceXML language. This data includes quality of service information,
OAM&P (operations, administration, maintenance, and provisioning) data,
billing information, and data related to individual and collective call
traffic. Since this data is not entirely covered by industry standards,
each technology vendor uses a different approach for formatting,
transporting, and storing the information.
Our approach to creating standards for the runtime data begins with an
attempt to list the data elements we wish to capture. We divide the
elements into six categories:
-
Data generated by the VoiceXML Gateway
-
Hardware and software processes
-
VoiceXML application data
-
ASR Performance and activity
-
TTS Performance and activity
-
Data processing
We estimate that there may be 100-200 data elements, a large but
tractable number. A few illustrative examples include:
-
Conferencing 3rd party
-
Resetting speech channel
-
Telephony card failure
-
Playback completed
-
CPU idle percentage
-
ASR version number
-
TTS memory usage
-
VoiceXML audio cache hits/misses
-
Maximum call duration
-
VoiceXML session ID
-
ANI
-
Database response time
Once we have a reasonably complete list of elements, refined with input
from industry participants, our next step is to define a transport and
storage format. A successful runtime data standard will enable service
providers to interchange VoiceXML gateways and page servers from
different vendors.
The Meta Language
Development tools and runtime software on the VoiceXML page server must
use the same meta language. Since the meta language is generally unique
to a given tool vendor, runtime software on the VoiceXML page server will
only work with development tools from the same vendor. One unfortunate
consequence of this restriction is that applications written with one toolset
will not necessarily run on a page server built by a different vendor.
This incompatibility threatens to thwart one cause for which VoiceXML was
created, that of application portability between platforms. If the target
application code is written in VoiceXML, then the systems may be compatible,
but our observation is that many applications are represented in a vendor
proprietary form (ASP, scripts, etc.) and then converted to VoiceXML at
runtime.
In an effort to solve this incompatibility, the VoiceXML Tools Committee
is studying ways to standardize the meta language. Vendors would then use
the standard meta language to represent parameters of the call flow, even
if vendor tools otherwise provide different features. Two proposals under
consideration are 1) the XForms standard under development by the W3C and 2)
an XML-based standard where styles sheets convert between formats used by
different vendors. This rather ambitious goal will, if successful, improve
the interoperability of development and runtime tools and make applications
portable across vendors.
Conclusion
Tools for developing VoiceXML-based speech applications are a critical
factor in making VoiceXML easy to use. While VoiceXML itself may be
well-defined, industry software for generating VoiceXML code lacks
uniformity. We have launched an effort to define two standards that
will help VoiceXML systems interoperate across different vendors. The
effort will define how applications are represented and how runtime data
is transported and stored. We hope that this effort will foster the creation
of better tools and make developing VoiceXML services faster and easier.

back
to the top

Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|