Some
Thoughts on Speech Grammar
In
this monthly column, an industry expert will answer
common questions about VoiceXML and related technologies.
Readers are encouraged to submit questions about VoiceXML,
including development, voice-user interface design,
and speech technology in general, or how VoiceXML is
being used commercially in the marketplace. If you have
a question about VoiceXML, e-mail it to speak.and.listen@voicexmlreview.org
and be sure to read future issues of VoiceXML Review
for the answer.
Q:
I'm writing a VoiceXML application that I want to be
as portable as possible. I realize W3C speech grammar
markup will be the ultimate choice eventually, but it
seems that a lot of platforms don't support it yet,
or if they do only partially. Any thoughts as to what
a good interim strategy is wrt grammars? Are there other
areas in VoiceXML additional to grammars where there
are portability issues I should be aware of?
A:
Great question. In working with VoiceXML (and any healthy
standard), you will always face the choice of how you
want to prioritize cross-platform compatibility vs.
the newest/most advanced features. As any healthy standard
evolves, two things happen.
First,
leading vendors continually innovate new features in
response to client demand--- at first, these are of
course "proprietary" extensions to the standard.
Vendors who are committed to the standards process always
take these innovations and actively evangelize them
to the public standards process with the hope of getting
them folded in over time. Along the way, many other
vendors may even adopt these extensions as a "de
facto" standard before final "true" standardization
is completed. Exact syntax and features may shift some,
of course, during the standardization process, and the
original vendors are then on the hook to update their
platforms to be compliant with the final version.
Secondly,
platform vendors can't all turn on a dime, and some
will always lag somewhat in adopting complete implementations
of the standard -- especially as it grows and evolves
over time (W3C speech grammar markup is a great example
of this). Once again, leading vendors will stay fairly
in sync in a reasonable time frame--- that's what makes
them leading vendors.
Given this healthy and innovative (but imperfect) environment,
you always have the choice of which features to take
advantage of when building applications. What's the
right strategy? The answer is, like it or not, "it
depends". You need to take a look at the following
things:
1) Examine all platform vendors you're interested in,
and see which features (including grammar formats) they
currently support, and talk to them to find out their
existing track record and ongoing philosophy regarding
keeping in step with the standard as it evolves.
2) Think hard about where your priorities are -- do
you really intend to deploy on multiple platforms? How
many different ones do you *really* care about?
3) Specifically for grammar formats, do the vendors
you care about support it sufficiently today to get
*your* applications done? Even though there are features
that may not be supported, what matters more is if the
features you really need are supported.
4) If there are features that you want to use that aren't
quite perfectly cross-platform compatible today, what
will it really cost you in development time to make
the necessary changes should you choose to switch?
Remember,
millions of people made the decision to write slightly
different versions of their Web sites for IE vs. Navigator
to optimize perforance on both. In my opinion, VoiceXML
is already far superior (e.g. less inconsistencies across
implementatinos) to HTML in this regard---- but you
have to make your own decision specific to your business
objectives and needs.
Q:
Given the current state of speech recognition technology,
When writing speech grammars for VoiceXML apps, is it
best to write small compact grammars with a very narrow
set of possible utterances or is it better to write
larger wide open grammars?
A: It's most important to write your grammars
to closely match what your callers are actually saying
-- having too much coverage (too many phrases in the
grammar, especially ones that are confusable with one
another) is equally as bad as having too little (having
many things missing that your callers reguarly say).
Optimizing this balance through a combination of great
grammar design, and great UI design that carefully guides
callers to "say the right things" without
frustrating them, is the fine art that is voice appliation
design.

back
to the top

Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|