| |
W3C
The World Wide Web Consortium's Activities in Multimodal Interaction
By Deborah Dahl ,
Unisys Corporation, Chair,
W3C Multimodal Interaction Working
Group
[NOTE: This article is published with the express permission
of Unisys Corporation. Unisys
Corporation retains ownership of the copyright of this
article.]
Currently
interaction with the web takes place primarily with standard
web pages on desktop browsers. Newer forms of interaction, such as
voice interaction and mobile handset
applications, are also starting to become more widespread.
Multimodal interaction adds new dimensions to the experience of
interacting with web applications. It goes beyond GUI or voice-only
inputs by allowing users to interact with applications in multiple
ways, combining several modes. Eventually, input modes could include
speech, keyboard, pointing devices, and handwriting, as well as other
modes that might become popular in the future, such as gestures.
Perhaps
the most obvious reason to use multimodal technology is to
enrich the web experience. Another important feature of multimodal
interaction is that it can leverage the inherent advantages and
disadvantages of different input modalities to make the user's
experience much more natural and efficient. For example, in a travel
planning application, it's very natural to be able to speak the name
of a destination rather than scrolling through a long list to select
one. However, on the output side, it's much more efficient to view a
list of flights on a display than to listen to someone reading the
items in the list one by one.
Another
significant advantage to multimodal applications is that they
can adapt to different situations and different users by taking
advantage of the characteristics of the different modes. For example,
voice interaction with an application might be inappropriate in a
crowded meeting, while GUI interaction would be unsafe for someone
who's driving. Ideally, the same application could support both voice
and GUI interaction, depending on the user's environment. The
advantages and disadvantages of different modalities also vary
depending on the device that's being used. For example, voice is a
very natural input modality on small devices such as cell phones with
awkward keypads.
Finally,
multimodality can play an important role in making the web
accessible to users with disabilities. Users with disabilities
will be able to make use of alternate input modalities if they're
unable to use the standard modality due to their disability.
What is the
W3C's role in multimodal interaction?
The W3C has
recently started a Working Group to define multimodal
specifications for the web. Although some earlier work was done on
multimodal requirements in the W3C's
Voice Browser Working Group
, the Multimodal Interaction Working Group
is itself very new, having been chartered in February of 2002. Its
charter is effective for two years, until February of 2004. The group's
work is done during weekly teleconferences as well as periodic
face-to-face meetings. The first face to face meeting was held
February 28 and March 1 during the 2002 W3C Technical Plenary meeting
in Cannes, France. That meeting defined the immediate directions for
the group's activities. A second face to face meeting took place in
Boston from June 20-21.
Currently the
group's primary activities include compiling multimodal
use cases and requirements for a multimodal specification. There are
also several individual teams working on specific exploratory efforts
in the areas of events, architectures, natural language, and
ink. Events, for example, are particularly important to multimodal
applications because of the need to synchronize and coordinate inputs
from different modalities. As the group completes the compilation of
use cases and requirements, it will begin working on the
specifications that define standards for multimodal markup.
Ideally,
these standards will both accommodate current technologies,
and will also be extensible to accommodate future input modalities as
they become available. In addition to a multimodal specification,
the group is also chartered to define an ancillary specification for
representing user input in a normalized form, so that inputs from
different modalities will have compatible representations. This work
builds on earlier work that was done in the Voice Browser group on the
Natural Language Semantics Markup Language (NLSML).
For a description of the NLSML specification, please refer to my earlier
VoiceXML Review article.
What the Multimodal Interaction Working Group isn't
doing.
While the
Multimodal Interaction Working Group and the W3C will play
an important role in standardization efforts, there is a great deal of
very important work to be done in multimodal application development
which falls outside of the standardization process. For example,
although we understand a lot about what makes a GUI interface usable,
and we're starting to acquire some of the same knowledge about voice
interfaces, there is still a tremendous amount to be learned about how
to design easy-to-use multimodal interfaces. These are results that
will begin to emerge from the multimodal research community,
independent developers, usability researchers and commercial
deployments as applications are developed and used in real situations.
Another area of important work that's outside the scope of
standardization is the development of tools for application
development.
What other activities are relevant to Multimodal Interaction?
Because
multimodal interaction includes so many other components, the
number of other activities and standards, both inside and
outside of the W3C, that are potentially relevant is quite
large. It's important for the multimodal work to leverage
these other standards in order both to
fit into the web and wireless environments as well as to avoid
reinventing the wheel.
Here are just a few examples of some standards and
activities that the multimodal group needs to be familiar with.
- W3C standards relevant to web documents in general such
as XHTML,
XForms,
and
XML Events
are clearly important.
- The W3C Speech Interface Framework is producing several important
speech-related standards, in particular
VoiceXML,
SSML -- the Speech Synthesis Markup Language and
SRGS --
Speech Recognition Grammar Specification.
- Closely related to multimodal input is multimedia output such as
audio, video and animations. Fortunately, there is a very
comprehensive W3C Recommendation,
SMIL 2.0,
which defines a standard for coordinated multimedia output.
- Because multimodal applications are of tremendous interest in
wireless and telephony environments, it's also important for the
Working Group to be informed about telephony standards. Organizations
such as the
European Telecommunications
Standards Institute (ETSI) and 3GPP also do work that
is complementary to the multimodal group's. A newly announced
initiative, Open Mobile Alliance (OMA), has the goal of
creating an interoperable global market for future mobile
services by taking advantage of
open standards.
- Because wireless devices are much less capable than desktop
systems, issues such as distribution of processing across client and
server devices become of interest. For example, standards for
distributed speech recognition such as the Aurora standard, being
developed by ETSI are extremely
relevant.
- Collaboration among companies on developing new approaches
prior to standardization is also important. For example, the
SALT Forum has developed
a specification for tags that can be embedded in HTML or XHTML
documents to support the development of GUI applications that
include speech interaction. Similarly, another industry group
has developed a multimodal specification based
on integrating XHTML and VoiceXML which has been
submitted to and acknowledged by the W3C.
These specifications are clearly of great interest to the multimodal
interaction group.
How can I find out more?
The
W3C Multimodal
Interaction Working Group maintains a public page
describing its activities as part of the
main W3C web site .
You can find links to the group's charter, the public email archive,
and many other related documents on that site. Employees of
W3C member organizations can also access the group's internal web pages
and email archive. The W3C Multimodal Interaction working group
clearly has some exciting challenges ahead of it. The final result
will move us closer to the vision of transparent access to the web by
anyone, anytime and anywhere.

back
to the top

Copyright
© 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE
Industry Standards and Technology Organization
(IEEE-ISTO).
|