Enhancing
VoiceXML Application Performance By Caching
By Dave Burke,
Introduction
The VoiceXML architectural model specifies a partitioning
of application hosting, and application rendering (figure
1). Specifically, the application is served from a Web
Server and is typically created dynamically within the
framework of an Application Server or equivalent. The
VoiceXML Interpreter renders the resultant VoiceXML
document, transmitted across a network by HTTP, into
a series of instructions interpreted by the Implementation
Platform. Implied in this model is a geographical distribution
of the application hosting environment and the VoiceXML
platform and thus the incursion of network latencies.
An application might make many subsequent requests for
new VoiceXML documents during its lifetime and thus
these latencies may have considerable adverse effects
on performance. In this article we will discuss how
caching can be used to enhance the performance of VoiceXML
applications. Caching is a strategy for storing temporary
'objects' (e.g. VoiceXML resources) local to the VoiceXML
Interpreter that can be employed by the application
developer for optimising these latencies. In what follows
we will use the phrase 'origin server' to denote the
application hosting environment, and 'user agent' to
refer to the VoiceXML Interpreter and Implementation
Platform.

Figure 1: The VoiceXML architecture model
Why
bother with Caching?
Probably the most pertinent reason for caching is to
maximise customer satisfaction via improved performance.
Since VoiceXML applications are conducting audio dialogues
with humans, they should endeavour to respond within
the timing boundaries expected by humans. There are
also considerable technical advantages to employing
a good caching strategy. Load on web servers (and hence
corresponding application servers and databases) is
reduced thus facilitating savings on scaling costs.
Network load is also reduced and since most Internet
hosting companies charge for different levels of IP
connectivity, it makes financial sense to conserve bandwidth.
HTTP
Caching Mechanisms
The purpose of HTTP caching is twofold:
i. to actually avoid the need to make requests to the
origin server in many cases, and
ii. to eliminate the need to send full responses in
many other cases.
This results in two concepts called expiration and validation,
respectively. A local copy of a document that is not
expired may be executed without requiring a costly fetch
to the server. An expired document that is validated
against the server may not require a full re-transmit
of the document to the platform. Specifying the expiration
times is the responsibility of the application developer
and the trick to creating high performance applications.
A
VoiceXML platform's caching mechanism is usually similar
to that of traditional (visual) browser environments
that implement multi-tier strategies. An instructive
way of understanding how caching works for a caller
of a VoiceXML application is in analogy to a person
using a computer in an Internet café: the chosen
computer has a local cache that has been used by previous
users in the past and may or may not already contain
information required by the current user. Since there
is a reasonable likelihood that another person, albeit
at a different computer, has fetched the same resource
before, a network-level proxy cache additionally stores
resources for all users. A local cache will give a better
response time than a proxy cache, which in turn will
yield better performance over requiring the user agent
to make requests to the origin server for all resources.
Figure 2 illustrates a standard multi-tier cache architecture.

Figure 2: Multi-tier cache architecture
The
architecture in figure 2 is easily extended to a hierarchy
of caches as the platform scales. Happily, from the
perspective of the application developer, the platform's
implementation of the caching architecture is largely
transparent to the methods for using it, and we discuss
these next.
Controlling the HTTP Caching Policy
The caching policy can be controlled by the application
developer by specifying attributes in the VoiceXML document
[2], [3] and/or by using HTTP header values [1] set
on the origin server. Generally it is preferable to
use the HTTP headers to control the caching policy but
this may not always be possible (for instance if the
web server is not under the control of the application
developer). The VoiceXML attributes can also be used
by the user agent for finer grained control - e.g. forcing
a refresh of content or allowing stale content to be
used for an extended period of time.
HTTP
is a request/response protocol. A header is sent with
each request and received with each response. For example,
a HTTP request of:
GET /index.html HTTP/1.1
Host: www.voxpilot.com
Listing 1: An example HTTP request
might give a response of:
HTTP/1.1 200 OK
Date: Thu, 08 Aug 2002 09:02:39 GMT
Server: Apache/1.3.26 (Unix)
Cache-Control: max-age=86400
Expires: Fri, 09 Aug 2002 09:02:39 GMT
Last-Modified: Thu, 01 Aug 2002 14:52:43 GMT
ETag: "b7129-5913-3d494b3b"
Content-Length: 22803
Content-Type: text/html
Listing 2: An example HTTP response
followed
by the actual content (HTML in this case but can be
anything including binary octet streams etc). This can
easily be verified by using a telnet session e.g. typing
telnet www.voxpilot.com 80
followed by HTTP request (listing 1)
and two blank lines should trigger a response similar to listing 2.
The header fields Cache-Control, Expires, Last Modified,
and ETag in the HTTP response example above control
the caching policy for the requested object (index.html).
We explain the meaning of these and similar fields next.
Page 2

back
to the top

Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|