Building
the VoiceXML Forum Certification Program
(Continued
from Part 1)
The
important thing to notice here is that the user interfaces
change, but the rest of the application stays the same.
At this abstract level, we could think of this as not
three, but one single application with three user interfaces
(see Figure 3).

Figure
3 - Integrated Application Architecture
At
this point it should be clear that the fundamental difference
between applications is the user interface, so now we
will delve into some of the details of user interface
design for v-commerce.
Audio
data is sequential; the user must remember everything
that was said, as it's difficult at best to go back
and hear previous data. Any menu presented must be kept
short, as with an m-commerce application. However, a
voice application can increase usability by making use
of grammars, natural language constructs that allow
users to select menu options without remembering an
exact keyword. In addition, grammars can enable a mixed-initiative
dialog, allowing the user to navigate globally and skip
the menu system if they are familiar with the application.
For example, a user could say "black top hats"
at the first prompt and skip to the ordering section,
rather than having to say "top hats" at the
style menu, waiting for the color menu, then saying
"black."
As
with m-commerce applications, the biggest challenge
in v-commerce is data input. ASR (Automatic Speech Recognition)
does not yet provide accurate transcription of free-form
speech so another method is required. Text could be
entered one letter at a time using an alphabet grammar,
but this is far too cumbersome to be practical, and
also limited by ASR's ability to correctly distinguish
similar-sounding letters (such as all letters that rhyme
with 'E'). Where VoiceXML excels is in creating menus
consisting of a concrete number of options, and providing
intuitive grammars for those options. Therefore, we
must find a way to allow the user to input arbitrary
text information via set menus and patterns.
There
are two primary methods of handling arbitrary text information
in current enterprises. One is keyboard entry, and the
other is the tried-and-true human conversation. For
a small enterprise (like our hat store), or for an enterprise
that can afford powerful transcription software, speech
input may be a reality. VoiceXML lets the user record
information, such as a shipping address. That information
is then either entered manually into the shipping database,
or via the transcription software. This method can be
used, but it is important to understand that the VoiceXML
system cannot interact with this information, but simply
record it. Furthermore, the input box must be very specialized
(i.e. labeled as the shipping address) for transcription
software to make effective use of it.
Perhaps
the best option (until multi-modal technology becomes
widely available) is to make use of a connected e-commerce
application to enter arbitrary text. A user enters arbitrary
text "profiles" through the web interface,
and these profiles are saved and made available to the
audio interface as menu options. For example, a user
can enter two shipping addresses, and then have them
available from the audio interface as a two-option menu.
So
in the end what is needed is an application architecture
in which the database and application logic support
multiple user interfaces, and the user interfaces communicate
between each other (in practice, they communicate through
the lower application layers).

Figure
4 - Complete Architecture for an e-m-v Commerce Application
Constructing
an application to meet this architecture at first seems
to require a large development effort, especially if
there is an e-commerce and/or m-commerce application
already in place. However, as explained above, v-commerce
is really another aspect to the same application, not
a new application to be created independently of the
others. Unfortunately, many existing e-commerce applications
were not designed with multiple user-interfaces in mind,
and certainly not with interaction between the different
interfaces in mind. A modern application server makes
this sort of functionality possible, but requires the
re-construction of an entire existing e-commerce application
in order to add mobile or audio functionality.
The
Clickmarks Platform is a tool that enables rapid extension
of existing applications to new user interfaces, such
as VoiceXML, and allows communication between those
interfaces. This platform simply re-uses the existing
components (website, database, etc.), and provides an
easy mechanism to convert the web or mobile interface
into an audio-only voice interface. It provides an easy
way to alter the web application as needed to add support
for audio/mobile input, and provides tools for recognizing
pieces of web content and translating those to a voice
application.
In
addition, the Clickmarks Platform supports VoiceLet
technology, small standalone VoiceXML applications,
such as pop3 or IMAP e-mail access, web access, and
LDAP directory access. These VoiceLets can be inserted
into any existing VoiceXML application to add easy functionality
without programming.
In
summary, v-commerce, e-commerce, and m-commerce applications
should be viewed as different user interfaces to the
same application. These interfaces should communicate
with each other, as each has different strengths and
weaknesses, and all can be used to enhance the whole.

back
to the top

Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|