|
First Words
Welcome to “First Words” – the VoiceXML
Review’s column to teach you about VoiceXML and
how you can use it. We hope you enjoy the lesson.
VoiceXML
2.1
In this
lesson, we’re going to continue investigating
VoiceXML 2.1.
You may recall that as VoiceXML platform vendors and
application developers began to widely deploy VoiceXML
applications, they began to identify potential future
extensions to the language. The result of this experience
is a collection of field-proven features that are candidates
for addition to the VoiceXML language. These features
are being proposed as part of VoiceXML 2.1.
Just as a reminder, VoiceXML 2.1 has been released
as a Last Call Working Draft. Here is a pointer:
http://www.w3.org/TR/2004/WD-voicexml21-20040728/
Note: if
you’re reading this article after VoiceXML
2.1 has been finalized and published, you should spend
a few minutes tracking down the final specification
rather than this link, as the specification may have
undergone minor changes.
The new
features proposed for VoiceXML 2.1 are based on feedback
from application developers and VoiceXML
platform developers. The features we’ve covered
already include:
- Referencing
Grammars Dynamically – Generation
of a grammar URI reference with an expression;
- Referencing
Scripts Dynamically – Generation
of a script URI reference with an expression;
- Recording
user utterances while attempting recognition – Provides
access to the actual caller utterance, for
use in the user interface, or for submission to the application
server.
- Adding
namelist to <disconnect> -
The ability to pass information back to the
VoiceXML
platform environment (for example, if the application wishes to pass results
to a CCXML session related to the call).
Here are the links to the previous articles in this
series:
http://www.voicexmlreview.org/Sep2004/columns/sep2004_first_words.html
http://www.voicexmlreview.org/Nov2004/columns/nov2004_first_words.html
This issue,
we’re going to look at:
- Using <mark> to
detect barge-in during prompt playback – Placement
of ‘bookmarks’ within
a prompt stream to identify where a barge-in has occurred;
The <mark> Tag As the reader may know, VoiceXML is intended to work
well with some additional standards developed within
the W3C Voice Browser Working Group. In particular,
the Speech Recognition Grammar Specification (SRGS,
see http://www.w3.org/TR/2004/REC-speech-grammar-20040316/)
and the Speech Synthesis Markup Language Specification
(SSML, see http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/)
work very well with VoiceXML.
The SSML
specification defines the ‘mark’ tag,
which allows the placement of a bookmark or marker
into an SSML fragment that is going to be rendered
by the SSML processor. When the SSML processor encounters
such a mark tag in the SSML, it is required to inform
the VoiceXML ‘interpreter context’ that
it has done so.
VoiceXML
2.1 allows <mark> to be easily used
within a VoiceXML application. In particular, the following
additions to VoiceXML 2.0 are specified:
‘namexpr’ attribute on <mark> -
SSML defines a ‘name’ attribute for <mark> which
allows the bookmark to be identified by the application.
Each SSML fragment might therefore have multiple markers,
each identified by a name. VoiceXML 2.1 extends this
by adding the ‘nameexpr’ attribute, which
allows the specification of an ECMAScript expression
defining the name. This provides for more flexibility
on the client-side, as well as providing consistency
with the rest of the language (where most elements
accept static and expression versions of particular
attributes). As is usual for this attribute pairing,
if the application specifies both ‘name’ and ‘namexpr’,
an error.semantic event will be thrown.
Application
Level access to bookmark information – Although
VoiceXML 2.0 allows the use of <mark>, there
is no defined mechanism to access the information returned
to the interpreter context from the application level.
That is, when a <mark> is processed, the relevant
information is not available to the VoiceXML application
itself. VoiceXML 2.1 specifies two attributes on the
application.lastresult$ object: markname, and marktime.
These reflect the name of, and time since (in milliseconds)
the last <mark> was processed in an SSML fragment.
Note that processing of the SSML will end when the
fragment has been completed, or when a barge-in event
occurs. This allows the application to determine where
the barge-in occurred, using either time, or the bookmark
(or both, as shown below).
In addition to the attributes on the application.lastresult$
object, if a successful recognition occurs as part
of form-filling, the markname and marktime shadow variables
for the form item will also be set to the same values
as those in the application.lastresult$ variable.
Here is an example from the VoiceXML 2.1 Last Call
Working Draft.
<?xml version="1.0" encoding="UTF-8"?> <vxml xmlns="http://www.w3.org/2001/vxml" version="2.1">
<var name="played_ad" expr="false"/>
<form>
<field name="team">
<prompt>
<mark name="ad_start"/>
Hockey scores brought to you by Elephant Peanuts.
There's nothing like the taste of fresh roasted peanuts.
Elephant Peanuts. Ask for them by name.
<mark name="ad_end"/>
<break time="500ms"/>
Say the name of a team. For example, say Toronto Maple Leafs.
</prompt>
<grammar type="application/srgs+xml" src="teams.grxml"/>
<filled>
<prompt>
Sorry, there is no hockey this year. Boo hoo.
</prompt>
<if cond="typeof(team$.markname) == 'string' &&
(team$.markname=='ad_end' ||
(team$.markname=='ad_start' &&
team$.marktime >= 5000))">
<assign name="played_ad" expr="true"/>
<else/>
<assign name="played_ad" expr="false"/>
</if>
</filled>
</field>
</form>
</vxml> |
Now, beyond the fact that this has
been shamelessly converted to a hockey-based example, and that the advertisement
is perhaps interesting only to elephants,
it demonstrates an interesting use-case for the <mark> tag. We want to
be sure that the listener has in fact heard the advertisement, and that we
can bill the sponsor for having played their ad to another caller. To do this,
we check that we have either completed the entire ad (the ‘ad_end’ mark
has been processed), or that we have started and played at least five seconds
of audio. This length of time will be the time since the last mark (ad_start)
was encountered. If either of these conditions is true, then the ECMAScript
snippet in the <filled> block will assume the ad has been played to the
listener. Ka-ching.
Another use-case might be allowing the application
to restart prompt or SSML playback partway through
a long fragment, by keeping track of how far along
in the original playback we were interrupted.
Summary Here
is the direct link to the ‘mark’ tag feature:
http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-mark
In future
issues, we’re going to look at these:
- Using <data> to fetch XML without
requiring a dialog transition – Retrieval of
XML data, and construction of a related DOM object,
without requiring a transition to another VoiceXML
page.
- Concatenating
prompts dynamically using <foreach> -
Building of prompt sequences dynamically using Ecmascript;
- Adding
type to <transfer> - Support for additional
transfer flexibility (in particular, a supervised
transfer), among other capabilities.
These
are features that will likely get a full article
each, as they are powerful, and can provide the VoiceXML
developer with new ways to build applications.
VoiceXML
2.1 proposes some useful additional features for
VoiceXML 2.0, based on real-world deployment experience.
We’re going to continue looking at these in the
forthcoming issues drilling down into these features.
As always, if you questions or topics for VoiceXML
2.0 or 2.1, drop us a line!

back
to the top

Copyright
© 2001-2005 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|