| |
Let's
Talk Office Supplies
One
of the profound changes taking place in society is telecom
immersion. For those of us living in industrialized
nations, not only have telephones become our constant
companions, they are providing voice connectivity to
computers as well as to people. In fact, "virtual
telephone operators" and "virtual call center
agents" are increasingly prevalent as improvements
are made in automatic speech recognition for interpreting
telephone requests. At this juncture, and at the leading
edge of "speech recognition automation" in
customer relationship management, is VoiceXML.
VoiceXML
looks a lot like HTML. In fact, its designers hope that
VoiceXML will mediate the creation of a "voice
web" just as HTML mediated the creation of the
visual Web. While this may very well happen, the more
immediate beneficiaries of VoiceXML are not Internet
users, but telephone companies, call center operators,
and telephone-based CRM operations. VoiceXML brings
the benefits of Web infrastructure and tools to serve
the telephone-using public.
VoiceXML
benefits from being a member of the XML family that
is revolutionizing Internet communication. XML is being
put squarely in the middle of all Internet communication
through the efforts of Microsoft's ".NET"
initiative. Even if it weren't for Microsoft, XML would
be revolutionizing business databases, inventory control
systems and CRM systems
making it easier to share
data between organizations, inside and outside of firewalls.
XML has volumes written about it so we won't presume
to explain it further here. However, it should be understood
that VoiceXML benefits from all the ongoing XML technology
development.
VoiceXML
itself is a high-level language for authoring voice
applications. It allows developers to write voice applications
using simple markup tags and scripts rather than in
traditional and more complex programming languages.
This speeds up the development process enormously. VoiceXML
scripts work by orchestrating speech dialog systems
accessed on the telephone using TTS (text-to-speech),
recorded prompts and ASR (automatic speech recognition).
Extensive use is made of Internet and web development
technology to develop applications and deploy them,
but the telephone calls themselves need not involve
the Internet at all.
To
develop "voice applications" in VoiceXML,
you create XML scripts that specify "audio prompts"
to be played to callers and "recognition grammars"
to tell the recognizer what words to look for in callers'
responses to prompts. VoiceXML scripts themselves may
be created using any text editor (although XML validating
editors offer advantages). To deploy VoiceXML applications,
VoiceXML scripts are hosted on a Web server. The Web
server is then accessed by "voice browsers,"
which are computers running VoiceXML interpreters and
speech recognizers and interfaces to the telephone network.
When a "voice browser" answers a telephone
call, it retrieves VoiceXML scripts from a Web server.
The scripts may be generated dynamically by the Web
server. The VoiceXML scripts instruct the voice browser
to play prompts and to start the speech recognizer with
the appropriate grammar(s). When the caller's utterance
is recognized, the voice browser will select and transition
to another dialog that may be within the same VoiceXML
script or have to be fetched from the Web server.
Note
that "browsing" is somewhat of a misnomer
in this context. People typically do not browse on the
telephone. The name was chosen to point out the relationship
of VoiceXML to the Web, not to suggest that people follow
hypertext links on voice enabled Web pages.
As
we mentioned above, VoiceXML was conceived with the
vision of a "Voice Web" in mind, modeled after
the visual/graphic Web that HTML helped create. However,
there is a huge hurdle to explosive growth of the Voice
Web. The "visual Web" was created by downloading
HTML interpreters, e.g. Web browsers such as Internet
Explorer, that could run on any of the millions of computers
connected to the Internet. The "Voice Web,"
on the other hand, cannot be created by simply
downloading VoiceXML interpreters to run onto today's
"typical computers" because "typical
computers" lack voice interfaces to telephones.
This will not always be the case. But for the time being,
Voice Browsers are not being set up by average computer
users. Rather, they are being set up and maintained
by Voice Portal providers such as Tellme, BeVocal, HeyAnita,
and Voxeo. In addition, and more importantly, they are
being set up and maintained by Voice Applications Service
Providers such as NetByTel and General Magic to serve
the "automated customer service representative"
needs of business and government.
Continued...

back
to the top

Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|