VoiceXML Events
Welcome
to "First Words" - the VoiceXML Review's column
to teach you about VoiceXML and how you can use it.
We hope you enjoy the lesson.
Handling
Complex Recognition Results
One of the changes in the April release of the VoiceXML
2.0 working draft was the formalization of how recognition
results would be made available at the VoiceXML level.
We're going to spend some time looking at how this impacts
your VoiceXML application, as part of the next few articles.
We've
written pages using simple recognition results in the
past. These include samples like this:
|
<field
name="command">
<grammar xml:lang = "en-US" version
= "1.0" root = "Help">
<rule id = "Help" scope = "public">
<one-of>
<item> help </item>
<item> save me </item>
<item> succour </item>
</one-of>
</rule>
</grammar>
<filled>
You said <value expr="command"/>
</filled>
</field>
|
In
this example, the VoiceXML variable 'command' will receive
the raw utterance value - one of the three acceptable
utterances in this case, 'help', 'save me' or 'succour'.
This is a straightforward example.
More
advanced grammars can explicitly fill 'slots' (variables)
to return one or more values from a grammar. Here is
the previous example modified to return an interpretation
of the user utterance in a slot with the name 'returnvalue'.
|
<field name="command">
<grammar xml:lang = "en-US" version
= "1.0" root = "Help">
<rule id = "Help" scope = "public">
<one-of>
<item>
<tag> returnvalue="help"
</tag> help</item>
<item> <tag> returnvalue="help"
</tag> save me</item>
<item> <tag> returnvalue="help"
</tag> succour</item>
</one-of>
</rule>
</grammar>
<filled>
You said <value expr="command"/>
</filled>
</field>
|
In
this example, we have the ECMAScript variable 'returnvalue'
receiving the value 'help' in all three cases - regardless
of which of the three legal user utterances are recognized.
Using techniques like this can simplify your application,
and will in general make your grammars more usable and
extensible.
The
exact format of what is inside the <tag> element
will vary from platform to platform right now (the Semantic
Interpretation specification is still being developed,
as are techniques for mapping these results into VoiceXML),
but the contents will likely be a variant or subset
of ECMAScript. Consult your ASR or platform vendor for
details.
The
exact format of what is inside the <tag> eleme
Before we move into complex combinations, we need to establish
how the simple cases work. From the VoiceXML specification:
-
If the interpretation is a simple result, this is
assigned to the input item variable.
-
If the interpretation is a structure and the slot name matches a property,
this property is assigned to the input item variable.
-
Otherwise, the full semantic result is assigned
The
'interpretation' is returned to the VoiceXML interpreter
from the recognizer. The interpretation allows the recognizer
to assign some meaning to the results rather than simply
providing the raw utterance to the user of the recognizer.
The interpretation is actually provided to the VoiceXML
application as an ECMAScript object. This has some implications,
as we'll see later.
In our first example, where we don't fill a slot, the
interpretation will be a simple result - just a string
representing the utterance. This means that the actual
user utterance will be assigned to the field variable
'command'.
In
the second example, since we actually return a slot,
the interpretation will take the form of an ECMAScript
object, something like:
{ returnvalue: "save me" }
depending upon the actual user utterance, of course.
According to the second and third rules above, however, we wouldn't
get the result we possibly expect at the VoiceXML level. The entire
interpretation would be assigned (as an object) to the field variable
(a reasonable alternative to this would be to assign the value to the
variable in the event that only a single slot is returned, and this was
commonly done prior to definition of this behavior in the specification). In order to access the value of interest, we would need to reference a component of the object, using the ECMAScript convention for referencing an object property:
|
<filled>
You said <value expr="command.returnvalue"/>
</filled>
|
Each
slot returned by a grammar would be available in this
manner.
As an alternative, we can specify the slot of interest
in the field tag:
|
<field name="command" slot="returnvalue">
<grammar xml:lang = "en-US" version
= "1.0" root = "Help">
<rule id = "Help" scope = "public">
<one-of>
<item> <tag> returnvalue="help"
</tag> help</item>
<item> <tag> returnvalue="help"
</tag> save me</item>
<item> <tag> returnvalue="help"
</tag> succour</item>
</one-of>
</rule>
</grammar>
<filled>
You said <value expr="command"/>
</filled>
</field>
|
In
this case, the property in the interpretation would
be assigned to the field variable 'command' from the
interpretation property 'returnvalue', as the slot name
matches the property name.
We could have achieved the same result by changing the name of the
field to match the single slot being returned by the grammar
(returnvalue). This can only done once with the same grammar in the
same scope, however.
Summary
We've
had a quick look at how recognition results are passed
back to your VoiceXML application in different situations,
and how you can access them.
Suppose,
however, that we have a form-level grammar, which can
possibly fill more than one slot (and hence populate
multiple fields from a single utterance). How do we
map the results from the grammar into VoiceXML variables?
There are a number of issues that arise when considering
this case, and the authors of the most recent versions
of the VoiceXML specification have carefully specified
how the results returned from grammars will be used.
Next month, we're going to look at more complex results
and how they can be used in your application.
Best wishes for a safe and happy holiday season from
everyone here at the VoiceXML Review.
VoiceXML
Users Group Call for Participation
The VoiceXML Forum is beginning to prepare for the Spring Users Group
Meeting, to be held in conjunction with the
AVIOS Speech Developers
Conference and Expo, from March 31st to April 3rd 2003, at the Fairmont
Hotel in San Jose California. The VoiceXML Users Group Meeting will be
held on April 3rd.
In the past, the VoiceXML User Group has provided tutorials,
technology overviews, and other such features to allow
technology leaders to become familiar with speech technologies.
VoiceXML is clearly now in the mainstream of speech
application development. So for the Spring Meeting,
the VoiceXML Forum is looking to the VoiceXML user community
to share its experience to-date by provide live demos
of their VoiceXML Technologies and/or sharing practical
feedback on their experiences with VoiceXML.
Some possible topics would include live demos and/or
experience reports in the areas of :
· Writing Portable VoiceXML Applications;
· Speech Application Development;
· VoiceXML Platforms;
· Speech Application Tuning;
· Deployment concerns;
· Grammar development;
· Systems Integration Issues.
Or pick another topic related to VoiceXML and the real world.
Take this opportunity to pass along your successes (and failures!)
and help the industry to evolve.
If you would like to participate in this UGM, by presenting a demo of
your VoiceXML application or related technology or share your VoiceXML
experiences, please submit a short abstract on the topic you would
like to present to
brett.mcdowell@ieee-isto.org by February 21, 2003.

back
to the top

Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|