|
In this monthly column, an industry expert will answer
common questions about VoiceXML and related technologies.
Readers are encouraged to submit questions about VoiceXML,
including development, voice-user interface design,
and speech technology in general, or how VoiceXML is
being used commercially in the marketplace. If you have
a question about VoiceXML, e-mail it to speak.and.listen@voicexmlreview.org
and be sure to read future issues of VoiceXML Review
for the answer.
Q. I notice that VoiceXML 2.1 specifies support for the
Document Object Model (DOM) via the new <data> tag.
As a VoiceXML programmer, the DOM is completely new to
me. What's the best way for me to ramp up?
A.
Since all DOM activity is managed by the W3C,
the primary resource for the DOM
is http://www.w3.org/DOM/.
To learn the DOM API, the "source of truth"
is the DOM specification, also located on the W3C site:
http://www.w3.org/DOM/DOMTR.
You'll notice that the DOM has gone through several
iterations since its inception:
Level
1, W3C Recommendation, October 1998:
http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/
Level 2, W3C Recommendation, November 2000:
http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/
Level 3, W3C Recommendation, April 2004:
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/
For
the purposes of VoiceXML, you'll want to focus on DOM
Level 2 which contains a number of improvements over Level
1 such as XML namespace support (http://www.w3.org/TR/REC-xml-names/).
Further, you can limit your study to the read-only subset
of methods and properties enumerated in Appendix D of
VoiceXML 2.1.
Fortunately for you (and me), the DOM has been around
since 1998, shortly after XML itself became a full W3C
recommendation, so if the DOM Level 2 specification
seems a little intimidating, there's a wealth of resources
including tutorials, tools, and sample code available
all over the Web to get you started programming the
DOM. After all, the best way to learn the DOM is to
start writing real code
that exercises the DOM.
You'll
find implementations in most popular programming languages
on most implementation platforms including Java, C/C++,
C#, Perl, and JavaScript. Since VoiceXML and ECMAScript
are tightly integrated, VoiceXML programmers will get
the most use out of implementations of the DOM exposed
through JavaScript. Here are two:
Microsoft
MSXML
Microsoft implements a set of XML services including the
DOM via the Component Object Model (COM). If you're running
a reasonably recent version of Microsoft Windows, and
you have Internet Explorer 5 or later installed, you've
already got MSXML. If not, you can download it from the
following URL:
http://msdn.microsoft.com/XML/XMLDownloads/default.aspx.
Once installed, you can create an instance of the DOM
within an HTML page rendered in IE 5 or later using the
<xml> tag (http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/xml.asp).
You can also create an instance of the DOM independent
of IE using the Microsoft Windows Scripting Host (WSH).
I'll show you how below.
Mozilla.org
Mozilla.org is a software foundation that fosters the
open source development of a number of Internet-powered
products and technologies including the Mozilla Web
Browser, Firefox (http://www.mozilla.org/products/firefox/).
Firefox integrates a DOM implementation which cannot only
be used to manipulate the current HTML document from JavaScript
but also can be used to manipulate XML documents you create
on the fly or load via a URI. I'll provide you with a
quick and dirty example below.
If you want to jump right into DOM programming in a
VoiceXML environment, here are a couple of voice browser
implementations that support the DOM via the <data>
tag as extensions to their VoiceXML 2.0 implementations:
Tellme Networks
(http://studio.tellme.com/dom/howto/using_data.html)
BeVocal (http://cafe.bevocal.com/docs/vxml/data.html)
You can sign up for a developer account on Tellme Studio
or BeVocal Cafe to test your voice applications. All
you'll need is a Web server with a public interface
on which to host your VoiceXML content and XML data
and access to a telephone.
Using
MSXML from WSH
If your desktop machine is running a reasonably recent
version of Microsoft Windows, you probably already have
MSXML on your machine. If not, download and install
it from the link above.
Copy and paste the following code into a text file
and call it msft_dom.js:
var oDOM = BootStrap(WScript.Arguments(0));
if (!oDOM) {
WScript.Quit();
}
// now that we have a DOM, let's start coding...
// get the root document element
var oRoot = oDOM.documentElement;
Log("root nodename=" + oRoot.nodeName);
Log("version=" + oRoot.getAttribute("version"));
try {
Log(GetChannelTitle(oDOM));
}
catch(e) {
Log(e.description);
}
function GetChannelTitle(dom) {
// retrieve "/rss/channel/title" (see XPath spec for this notation)
var sTitle = "";
var oRoot = dom.documentElement;
if (oRoot.nodeName == "rss") {
var oChan = GetFirstChildNamed(oRoot, "channel");
if (oChan != null) {
var oTitle = GetFirstChildNamed(oChan, "title");
if (oTitle != null) {
sTitle = oTitle.firstChild.data;
}
}
}
return sTitle;
}
// do a shallow traversal of oParent to find the first node named sName
function GetFirstChildNamed(oParent, sName) {
var oNode = null;
if (oParent != null) {
for (var i = 0; i < oParent.childNodes.length; i++) {
var oChild = oParent.childNodes.item(i);
if (oChild.nodeName == sName) {
oNode = oChild;
break;
}
}
}
return oNode;
}
// create a DOM instance, load the specified URI,
// and return the DOM if successful
function BootStrap(uri) {
var oDOM = WScript.CreateObject("MSXML.DOMDocument");
oDOM.async = false; // let's keep it simple, shall we
oDOM.validateOnParse = false; // disable validation for speed
oDOM.load(uri);
if (oDOM.parseError.errorCode != 0) {
Log(oDOM.parseError.line + ", " + oDOM.parseError.reason);
return null;
}
else {
return oDOM;
}
}
function Log(s) {
WScript.Echo(s);
}
The BootStrap function encapsulates the Microsoft COM-based
approach to creating an XML parser, using it to load
an XML document from a URI, and returning a reference
to the DOM exposed by the parser. The URI passed to
the function is taken from the command-line. WSH implements
the WScript object and exposes the command-line arguments
you pass in via the Arguments collection
Once the XML document is loaded, the rest of the code
is generic DOM manipulation. The code above shows you
how to do the following:
1) Retrieve a DOM node representing the root document
element
2) Retrieve the name of a DOM node
3) Retrieve the value of an attribute
4) Traverse the child elements of a DOM node
5) Retrieve the text contained within a DOM node
Here's how to invoke the code:
cscript msft_dom.js http://news.com.com/2547-1_3-0-5.xml
Here's the expected output:
root nodename=rss
version=2.0
CNET News.com
When
the script executes, it fetches a public RSS (Really Simple
Syndication) feed from the
news.com site which contains
the five most recent news stories. Lots of other XML data
sources are freely available on the Web, and you can certainly
create your own on your own hard disk or Web server.
To
learn more about the WSH environment, see http://msdn.microsoft.com/library/en-us/script56/html/wsoriWindowsScriptHost.asp.
To learn the particulars of Microsoft's implementation
of the DOM, download the MSXML SDK.
Using
Mozilla
The Mozilla Browser supports numerous ways to parse an
XML document and expose it via the DOM. The following
HTML document demonstrates three of these mechanisms:
<html>
<head>
<title>XML Test</title>
<style>
body {font-size: 9pt; font-family: verdana;}
</style>
<script>
var doc = null;
// xml http error callback
function handle_error() {
alert("error");
}
// return the uri in the textbox for parsing
function GetURI() {
var uri = "";
try {
uri = document.getElementById("txt1").value;
}
catch(e) {
Log("Couldn't get URI to load");
}
return uri;
}
// return the content contained in the textarea for parsing
function GetAreaContent() {
return document.getElementById("area1").value;
}
function test_load() {
doc = document.implementation.createDocument("","",null);
doc.async = false;
try {
if (doc.load(GetURI())) {
UseDOM(doc);
}
else {
Log("Unable to load " + GetURI());
}
}
catch(e) {
Log("Unable to load " + GetURI());
}
}
function test_xmlhttp() {
var oReq = new XMLHttpRequest();
oReq.onerror = handle_error;
try {
oReq.open("GET", GetURI(), false, "", "");
oReq.send("");
doc = oReq.responseXML
UseDOM(doc);
}
catch(e) {
Log("Unable to load " + GetURI());
}
}
function test_domparser() {
var parser = new DOMParser();
doc = parser.parseFromString(GetAreaContent(), "text/xml");
UseDOM(doc);
}
var xmp = null;
function init() {
xmp = document.getElementById("xmp1");
}
// customize this function to play with the DOM
function UseDOM(dom) {
if (dom != null) {
Log(dom.documentElement.nodeName);
}
else {
Log("Don't have a DOM");
}
}
function Log(s) {
if (xmp) {
xmp.innerHTML += "<br />" + s;
}
}
</script>
</head>
<body onload="init()">
URI: <input id="txt1" type="text" value="file://c:/junk/fruit.xml"/>
<br />
<button onclick="test_load()">createDocument</button>
<button onclick="test_xmlhttp()">XMLHTTP</button>
<br/>
<textarea id="area1" rows="20" cols="50">
<items><item>banana</item></items>
</textarea>
<br/>
<button onclick="test_domparser()">DOMParser</button>
<fieldset>
<legend>Log</legend>
<div id="xmp1"></div>
</fieldset>
</body>
</html>
The test_load function creates a new DOM and then uses
the proprietary load method to fetch an existing XML document
from a URI. You specify the URI in the textbox (txt1).
The test_xmlhttp function uses the XMLHttpRequest object
to make an HTTP request for the same URI used by the test_load
function described above. Note that, due to cross-domain
security restrictions, the external XML document must
reside in the same domain as the HTML page. If you load
the HTML page from your local hard disk, the XML documents
you can load are limited to URLs accessed via the "file"
protocol.
The
test_domparser function uses the DOMParser object to
load an XML document from a string. The string extracted
from the textarea (area1).
You can learn more about the DOMParser and createDocument
interfaces at the following URL:
http://www.xulplanet.com/tutorials/mozsdk/xmlparse.php
You can learn more about the XMLHttpRequest interface
at
http://www.xulplanet.com/references/elemref/ref_XMLHttpRequest.html.
Using a VoiceXML interpreter that supports <data>
If you have access to a Web server with a public interface,
you can use Tellme and BeVocal's implementation of the
DOM by creating an XML document such as the following:
<?xml version="1.0"?>
<?access-control allow="*"?>
<list>
<item>apples</item>
<item>oranges</item>
<item>bananas</item>
</list>
|
Publish it to your Web server as "fruit.xml". Next, author
a VoiceXML document including a tag that references
the XML document:
<vxml version="2.0"
xmlns="http://www.w3.org/2001/vxml">
<catch event="">
<log>catch-all caught
<value expr="_event"/>
</log>
</catch>
<form>
<block>
<data name="dom1" src="fruit.xml"/>
<!-- now that the DOM is loaded, exercise it -->
<prompt>
<value expr="dom1.documentElement.nodeName"/>
</prompt>
</block>
</form>
</vxml>
Publish it to your Web server as "fruit.xml".
Next, author a VoiceXML document including a <data>
tag that references the XML document:
<vxml version="2.0"
xmlns="http://www.w3.org/2001/vxml">
<catch event="">
<log>catch-all caught
<value expr="_event"/>
</log>
</catch>
<form>
<block>
<data name="dom1" src="fruit.xml"/>
<!-- now that the DOM is loaded, exercise it -->
<prompt>
<value expr="dom1.documentElement.nodeName"/>
</prompt>
</block>
</form>
</vxml>
|
Publish
this document to your Web server in the same directory
as the XML data document, or adjust the value of the <data>
tag's src attribute accordingly. Configure your Tellme
Studio or BeVocal Cafe account to point to the URL corresponding
to the VoiceXML document, and call the access number provided
by Tellme or BeVocal to run your application.
Microsoft,
Mozilla.org, Tellme Networks, and BeVocal provide four
of the numerous implementations of the DOM. The fidelity
of each implementation with the official W3C specification
varies.
One of the goals of VoiceXML 2.1 is to standardize the
DOM implementation supported by all voice browsers so
that you can easily port your voice applications from
one VoiceXML platform to another.
Q. RSS
(http://blogs.law.harvard.edu/tech/rss)
is all the rage, and I'd like to expose RSS feeds via
a voice application. RSS is expressed in XML, so use
of the <data> tag to retrieve an RSS feed seems
like a natural fit, but when I attempt to use the data
tag to fetch an RSS feed directly, the interpreter throws
an "error.noauthorization" event to my application.
A.
According to section 5 of the VoiceXML 2.1 Working
Draft, an XML document retrieved by the interpreter via
the <data> tag must contain an "access-control"
processing instruction (PI) indicating the hosts and/or
domains that are allowed to access the data. This mechanism
is in place to protect data providers from having their
data exposed by an interpreter they trust to an untrusted
application. I'll go into that in more detail in another
column.
In
Appendix E of VoiceXML 2.1, the "access-control"
PI is described in detail. The last example demonstrates
how to indicate to an interpreter that any application
retrieved from any host or domain should be allowed to
access the data:
<?access-control allow="*"?>
But how do you get that PI into an RSS feed that you
don't publish? That's going to require a little server-side
magic to proxy requests from your voice application
to the desired RSS feed. Fortunately, you only have
to write the proxy once, and you'll be able to use it
again for any public data feed - not just RSS. Here's
a sample CGI implementation in Perl that uses the LWP::UserAgent
module:
#!/usr/local/bin/perl -w
use strict;
use LWP::UserAgent;
use CGI qw(param);
sub write_access;
sub Log;
#http://news.com.com/2547-1_3-0-5.xml
my $url = param("url");
# don't compromise your file system
# up to you to support other protocols (e.g. HTTPS)
if (!defined($url) || $url !~ /^http:\/\//) {
print "Status: 400 Baaad Request\n\n";
exit;
}
# enable autoflush
my $old = select STDOUT; $| = 1; select $old;
my $ua = new LWP::UserAgent;
$ua->agent('My-RSS-Proxy/0.1');
$ua->{timeout} = 10;
#BUGBUG: If you use a proxy to access the Internet, set this
#$ua->proxy("http", "");
my $req = new HTTP::Request("GET", $url);
my $resp = $ua->request($req);
# just forward the HTTP response headers
my $headers = $resp->headers_as_string;
my $crlfs = "";
if ($headers !~ /\n{2}$/) {
Log("Adding CRLFs to headers");
if ($headers =~ /\n/) {
$crlfs = "\n";
}
else {
$crlfs = "\n\n";
}
}
print "$headers$crlfs";
if ($resp->is_success) {
print $resp->content;
write_access;
}
else {
Log("Badness: " . $resp->status_line);
}
sub write_access {
print qq{<?access-control allow="*"?>\n};
}
sub Log
{
my($s) = @_;
print STDERR "$s\n";
}
If you're not familiar with Perl, here's the basic
idea: the CGI takes a single request parameter, "url",
which is the URL to the RSS feed or any other publicly
available data you want to retrieve. The script performs
some basic sanity checking on the value of this parameter
for the safety and security of the server that's hosting
the CGI, and then use the LWP::UserAgent module to perform
a simple HTTP GET request for that URL. If the request
is successful, the script prints the HTTP response headers
and the content followed by the "access-control"
PI. Otherwise, the script just prints the headers which
will include the HTTP status code indicating why the
request wasn't successful. I leave it as an exercise
to the reader to write the equivalent code in your server-side
language of choice.
Why put the PI at the end of the XML document?
Processing
instructions are discussed in 2.6 of the XML specification
(http://www.w3.org/TR/2004/REC-xml-20040204/#sec-pi),
and there's nothing in the spec that forbids one from
putting the PI at the end of the XML document.
Furthermore, there's nothing in the VoiceXML 2.1 spec
that forbids that either. We can't put it at the beginning
of the document because 2.8 of the XML specification is
explicit about where the XML declaration must occur if
it is present.
Violation of this rule will cause most XML parsers to
throw an exception or return an error which translates
into an error.badfetch thrown by a VoiceXML interpreter.
Since we don't control the RSS feeds and whether or not
they actually include an XML declaration, it's simply
safest and most optimal to stick the PI at the end of
the document.
Here's what a request through our proxy from a VoiceXML
application might look like. It fetches the Apple iTunes
Music Store's RSS feed for the five newest releases.
<vxml version="2.1"
xmlns="http://www.w3.org/2001/vxml">
<var name="feed_proxy" expr="'data_proxy.cgi'"/>
<script src="rsshelpers.js"/>
<form>
<block>
<catch event="error">
<log>feed fetch or access caused <value expr="_event"/></log>
Sorry. The requested information is unavailable. Please try again later.
</catch>
<var name="feed"
expr="'http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa
/wpa/MRSS/newreleases/limit=5/rss.xml'"/>
<data name="oFeed" expr="feed_proxy + '?url=' + feed"/>
<prompt><value expr="GetChannelTitle(oFeed)"/></prompt>
<exit/>
</block>
</form>
</vxml>
|
Here's the content of rsshelpers.js:
function GetChannelTitle(dom) {
// retrieve "/rss/channel/title" (see XPath spec for this notation)
var sTitle = "";
var oRoot = dom.documentElement;
if (oRoot.nodeName == "rss") {
var oChan = GetFirstChildNamed(oRoot, "channel");
if (oChan != null) {
var oTitle = GetFirstChildNamed(oChan, "title");
if (oTitle != null) {
sTitle = oTitle.firstChild.data;
}
}
}
return sTitle;
}
// do a shallow traversal of oParent to find the first node named sName
function GetFirstChildNamed(oParent, sName) {
var oNode = null;
if (oParent != null) {
for (var i = 0; i < oParent.childNodes.length; i++) {
var oChild = oParent.childNodes.item(i);
if (oChild.nodeName == sName) {
oNode = oChild;
break;
}
}
}
return oNode;
}
Although the DOM Document and Element objects expose
a getElementsByTagName method,
the GetFirstChildNamed function is more efficient since
it only does a shallow traversal of the nodes in the DOM
tree. It's sufficient given the structure of the XML document
and the elements we're trying to retrieve. The GetChannelTitle
function leverages GetFirstChildNamed to dig the RSS channel
title out of the DOM corresponding to the RSS feed.

back
to the top

Copyright
© 2001-2004 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization (IEEE-ISTO).
|