| |
An Introduction to SCXML
By Jim Barnett, Aspect Communications
Abstract: SCXML is a flexible state machine language that combines concepts from CCXML and
Harel State Tables. It enhances the basic concept of state machines with such powerful
concepts as conditions on transitions and nested and parallel states. As a result, it
provides a compact and intelligible representation of complex systems. SCXML is being
developed in conjunction with VoiceXML 3, but it will be useful in a wide variety of
applications involving the control and synchronization of asynchronous resources.
Introduction
SCXML (State Chart Extensible Markup Language) is a flexible state machine language that
is specifically designed for the construction of voice and multimodal interfaces. SCXML is being
developed by the W3C's Voice Brower Working Group as part of a larger effort to make VoiceXML more modular. The motivation behind
this effort is the observation that VoiceXML 2.0 applications are often difficult to maintain
or reuse because they don't separate flow of control from presentation logic. Thus a single
form will often contain both prompts and grammars to interact with the caller and <submit> or
<goto> tags to move to the next form. Anyone who wants to reuse the presentation logic
(the interaction with the caller) has to weed out all the <goto>s (the flow logic). Similarly,
anyone wishing to modify the control flow logic has to wade through the presentation logic to
find it. In response, the Voice Browser Working Group is focusing on an architecture for
VoiceXML 3.0 that cleanly separates data, presentation logic, and flow control. This work
is far from complete, but it may be useful to think of it as a refactoring of the existing
<form> element so that the presentation logic (the user interaction) is kept separate
from the flow control (<goto> or <submit>) and the global application data. This
factoring results in a programming model that similar to the model/view/controller paradigm that
has proved valuable in web applications.
Although this new architecture may seem like a radical change, it is important to note that
it is already possible to program in this model by using CCXML
in conjunction with VoiceXML.
Although CCXML is often described as a call control language, its call control functionality is
embedded in a general-purpose state machine language that can be used for a wide variety of
purposes specifically including the representation of the flow control logic of complex
VoiceXML programs. In developing SCXML our goal has been to refactor CCXML to separate out
the call control primitives from the state machine framework and to augment the latter with
powerful concepts from Harel State Tables, a state formalism that is the foundation of
the state machine notation in UML . As part of UML,
Harel State Tables have been widely used for years to model reactive systems, namely those
that must respond to asynchronous inputs. SInce human-computer interfaces, including both
voice interfaces and multimodal interfaces, are textbook examples of reactive systems, we
think that the Harel concepts will prove a valuable extension to CCXML.
The first working draft of the SCXML specification can be found here
The next section of this paper offers a quick introduction to its main constructs.
The example presented, a voice interface to an email reader, is chosen solely to illustrate
the state machine notation, and you should not assume that the ASR and TTS functionality that it
contains bears any resemblance to VoiceXML 3.0 markup.
2. Overview of SCXML
As in all state machine notations, the basic concepts in SCXML are states and transitions.
Example 1 shows a simple representation of a speech recognition system that has three
states: Listening, Recognizing, and AnalyzeResult. The system moves from Listening to
Recognizing when it gets the SpeechDetected event and thence to AnalyzeResult on the
RecoDone event. For the sake of illustration, we include the UML diagram as well as the
SCXML markup that corresponds to it. Note that SCXML is fully
asynchronous so events can occur at any time and the system will simply ignore them if it has no
transitions defined for them in its current state. In the example here, suppose that
the underlying platform generated a
SpeechDetected event while the system was in the Recognizing state. The SCXML interpreter
would not consider this to be an error, but the event would effectively be
dropped since the Recognizing state responds only to the RecoDone event. (In a practical
implementation we might want to modify the Recognizing state to raise an alarm in this case, since
two consecutive SpeechDetected events would indicate some sort of problem.)
Example 1

<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="Listening">
<transition event="SpeechDetected">
<target next="Recognizing">
</transition>
</state>
<state id="Recognizing">
<transition event="RecoDone">
<target next="AnalyzeResult">
</transition>
</state>
<state id="AnalyzeResult"/>
</scxml>l;
|
SCXML also allows conditions to be placed on transitions. In Example 2, we see a simple
email reader, which responds differently to the ReadDone event based on the value of
the variable email.next. If email.next is not null, the reader loops back to the ReadEmail
state to read another email, while it goes to the Done state if the variable is null. Note
that Done has a 'final' attribute set to true, which means that it is a final state. The significance
of final states will become clear in Example 4. (In the UML notation, final states are indicated
by small circles with a filled center, as in this example.)
Example 2

<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="ReadEmail">
<transition event="ReadDone" cond="email.next!=0">
<target next="ReadEmail">
</transition>
<transition event="ReadDone" cond="email.next==0">
<target next="Done">
</transition>
</state>
<state id="Done" final="true"/>
</scxml>l; |
In Example 3 we temporarily omit the Done state and elaborate the reading logic by
adding <onentry> and <onexit> operations to the ReadEmail state. These are actions that
are executed whenever the state is entered or left. In this example, we start reading a
new email whenver we enter the state and update our counter variables whenever we leave
it. The details of reading the email would be application-specific, but in this case
we use the <send> tag, similar to the one in CCXML, to send an event/command/message
to a TTS system. The <send> tag gives SCXML a flexible way of communicating
with external resources and is the primary means of integrating SCXML into a larger
system. As in CCXML, we assume that the implementation also provides a means for
external entities to deliver events to the SCXML session. The ReadDone event would be
such an external event, generated by the external TTS system when the play was complete.
In the <onexit> handlers, we use the <assign> tag, again similar to CCXML, to
update the value of the variables email.current and email.next. Thoughout these examples, we
wave our hands at the question of how these variables, which are
internal to the SCXML session, are kept in synch with the stae of the underlying email system.
The details of the integration with the underlying email system would be highly
implementation-dependent so we omit them throughout these examples for the sake of simplicity.
Example 3
<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="ReadEmail">
<onentry>
<send target="TTSSystem" type="TTS" event="queue" namelist="email.current"/>
</onentry>
<onexit>
<assign name="email.current" expr="email.next"/>
<assign name="email.next" expr="email.next + 1"/>
</onexit>
<transition event="ReadDone" cond="email.next!=0">
<target next="ReadEmail">
</transition>
</state>
</scxml>l; |
One of the most powerful features of Harel State Tables that SCXML borrows is the notion
of nested states. Nesting facilitates for the modelling of complex tasks by allowing
a parent state to be decomposed into substates. In Example 4, we have embedded the ReadEmail
and Done states in a surrounding ProcessEmail state, which also contains a Preproces state
and an Initial pseudo-state. In the UML notation, the child states are drawn inside the parent
state. In SCXML, the <state> tags for the children are immediate children of the parent
tag. The nesting is fully recursive in both cases, so that child states may have their own
children nested inside of them, though we do not show examples of this in this article.
The semantics of nested states requires that whenever the system
is in the ProcessEmail state, it is in one, and only one, of its substates (i.e., Preprocess,
ReadEmail or Done). Initial is a pseudo-state because it is not really a state and the system
is never 'in' it. Instead, Initial indicates the substate that the system should transition to
if a transition specifies the parent state ProcessEmail as its target. (Transitions may also
go directly to substates.)
Much of the power of nested states comes from their interaction with transitions and
<onentry> and <onexit> handlers. Suppose the system transitions to the ProcessEmail state as
shown in example 4. Given the value of the <initial> tag, the system will also simultaneously
move to the Preprocess state. The complex transition is atomic, in the sense that there is no
time during which the system is in ProcessEmail but not Preprocess. However, the <onentry>
handlers for ProcessEmail will be executed before those for Preprocess -
'from the outside in', so to speak. Now suppose that the platform generates the Ready event.
The system will transition to ReadEmail and execute its <onentry> handlers. If Preprocess
had any <onexit> handlers, they would execute before ReadEmail's <onentry> handlers.
However, no handlers defined at the ProcessEmail level fire during this transition because
the system has not left the parent ProcessEmail state. Now suppose that while the system is
in the ReadEmail state, the platform generates the AbortRead event. ReadEmail does not have
a transition defined for this event, but the parent ProcessEmail does. This parent transition
is triggered, sending the system to the WaitForCommand state. This transition causes the
system to exit both ReadEmail and ProcessEmail, and their <onexit> handlers are invoked
in that order ('from the inside out'). Thus the execution order of the <onentry> and
<onexit> handlers matches the nesting structure of the states and offers us a guarantee
that certain operations will be carried out no matter transition or transitions the system
takes to enter and leave the states in question.
The selection of transitions also follows the nesting structure of states in
that the mostly tightly nested transition wins. In other words, if ReadEmail
had a transition defined for the AbortRead event, it would have been selected
instead of the one at the ProcessEmail level. The logic behind this choice
becomes clear when we realize that the child state represents a refinement of
the parent state and therefore 'knows more' about the situation. It would also
be possible for multiple transitions within a single state to match an event.
For example, ProcessEmail might define two transitions for AbortRead with
different <cond> clauses, both of which might evaluate to true in some
circumstances. In this case (a 'tie' between transitions defined
at the same level), SCXML will select the first transition in document order.
Finally, Example 4 shows the significance of final states. Since Done is a final state and an
immediate child of ProcessEmail, we know that ProcessEmail has finished when the system
reaches Done. In SCXML, this causes a ProcessEmail.done event to be raised, which
can be used to trigger transitions like any othe event. In this case, ProcessEmail
transitions to WaitForCommand on the ProcessEmail.done event. (In the UML diagram,
the ProcessEmail.done event is implicit and the same transition is shown as a line
from ProcessEmail to WaitForCommand without any indication of the event.) Thus
the system will move from ProcessEmail to WaitForCommand under two conditions, the first
being the occurrence of the AbortRead event when the system is anywhere in ProcessEmail, and the
second being the system's arrival at the Done state via normal processing inside ProcessEmail.
Example 4

<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="ProcessEmail">
<onentry>
<var name="email.current">
<var name="email.next">
<var name="mail" expr="initMailStruct()">
<send target="emailSystem" type="email" event="fetch" namelist="mail"/>
</onentry>
<onexit>
<send target="emailSystem" type="email" event="CloseMailBox" namelist="mail"/>
</onexit>
<initial>
<transition>
<target next="Preprocess">
<transition>
</initial>
<transition event="AbortRead">
<target next="WaitForCommand">
</transition>
<transition event="ProcessEmail.done">
<target next="WaitForCommand">
</transition>
<state id="Preprocess">
<onentry>
<assign name="email.current" expr="first(mail)">
<assign name="email.next" expr="second(mail)">
</onentry>
<transition event="Ready">
<target next="ReadEmail">
</transition>
</state> <!-- Preprocess -->
<state id="ReadEmail">
<onentry>
<send target="emailSystem" type="email" event="queue" namelist="email.current"/%gt;
</onentry>
<onexit>
<assign name="email.current" expr="email.next"/>
<assign name="email.next" expr="email.next + 1"/>
</onexit>
<transition event="ReadDone" cond="email.next!=0">
<target next="ReadEmail">
</transition>
<transition event="ReadDone" cond="email.next==0">
<target next="Done">
</transition>
</state> <!-- ReadEmail -->
<state id="Done" final="true"/>
</state> <!-- ProcessEmail -->
<state id="WaitForCommand"/>
</scxml>l; |
Example 5 shows the use of parallel states in SCXML. Here we have added a set of
VCR control states to our email reader. These states run in parallel to the email reader, meaning
that at any given time the system is simultaneously in ProcessEmail (and one of
its substates) and in one of the VCR control states (VCRControl, Volume or Speed). Parallel
states thus represent a kind of fork and join logic, allowing control to be split into
concurrent threads. In this case, the parallelism is useful because the VCR states behave
the same way no matter where we are in the email reader.
In this example the UML diagrams deviate somewhat from the SCXML markup because the structure
is more explicit in the latter. In the SCXML there is a single top-level state Main,
with a <parallel> child. The semantics of the <parallel> tag require that entering
Main entail simultaneously entering each of the <parallel> tag's children.
In this case, we have created a child state ControlState containing the VCR Control, Volume, and
Speed states. (In the UML diagram, the Main and ControlState states are implicit.) The logic
of ControlState and its children are straightforward. The system waits in VCR Control for
either the IncreaseVolume or IncreaseSpeed command, in which case it transitions to either the
Speed or Volume states, uses the <send>command to trigger the appropriate platform
action, and then transitions back to VCR Control when the platform generates the Platform.done
event. To flesh out the example, we would want to allow for the speed and volume
to be decreased as well. We could do this either by adding DecreaseSpeed and DecreaseVolume
events, or by having ChangeSpeed and ChangeVolume events with a parameter for the amount of
the change (positive for increase, negative for decrease.) The <onentry> actions could
then pass the value of the parameter to the platform rather than using hardcoded defaults.
Finally, note that in Example 5 the ProcessEmail state is not included in-line but is
instead loaded from a separte file, ProcessEmail.scxml. This inclusion mechanism allows
for the reuse of markup and can also be used to break up complex complex state machines into
more manageable chunks. (The UML diagram contains a graphical equivalent to SCXML's inclusion
by reference in that only the top-level ProcessEmail state is shown, even though all its
substates are implicitly present).
Example 5

<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="Main">
<parallel id="Par">
<state id="ProcessEmail" src="ProcessEmail.scxml"/>
<state id="ControlState">
<initial>
<transition>
<target next="VCRControl">
</transition>
</initial>
<state id="VCRControl">
<transition event="IncreaseVolume">
<target next="Volume"/>
</transition>
<transition event="IncreaseSpeed">
<target next="Speed"/>
</transition>
</state>
<state id="Volume">
<onentry>
<send target="platform" event="IncreaseVolume" namelist="incr=5"/>
</onentry>
<transition event="Platform.Done">
<target next="VCRControl"/>
</transition>
</state>
<state id="Speed">
<onentry>
<send target="platform" event="IncreaseSpeed" namelist="incr=10"/>
</onentry>
<transition event="Platform.Done">
<target next="VCRControl"/>
</transition>
</state>
<state/> <!-- ControlState -->
</parallel>
</state> <!-- Main -- >
</scxml>l; |
Example 6 completes the picture by adding the ASR states in parallel with ControlState
and ProcessEmail, both of which are included from external files this time. We have wrapped
a parent ASRState around Listening, Recognizing and AnalyzeResult, and added transitions
from the latter state back to Listening. These transitions are conditioned upon the value
of the Result variable and use the <send> tag with target of 'scxml' to raise events
that are internal to the state machine and thus may trigger transitions in other parallel states.
(Note again that this is similar to the <send> tag in CCXML.)
The full example shows the power of the interaction between parallel and nested states.
There are three separate threads on control: one reading email, another listening for
user input and a third handling VCR control events. In the current example, the handling
of control events could be directly incorporated into the speech states, but in a multimodal
system, the control
events could be generated by GUI input as well as speech so it makes sense to keep them separate.
The three parallel sets of states are independent of each other in that the speech recognizer doesn't
care what state the email reader is in and vice-versa, but they communicate by raising events. If
the speech recognition system detects a command to raise the volume, it generates the appropriate
event, which is caught by the VCR Control state, which then issues the appropriate platform command,
while the email reader continues uninterrupted. On the other hand, if the speech recognition
states detect an abort command, they generate an AbortRead event, which is caught by the
ProcessEmail state no matter what substate it is in. Since the VCR Control states don't care
about the AbortRead command, they ignore it. We have thus succeeded in factoring a complex
user interface into three compact state machines which interact in a flexible but
strictly defined manner. The result is a simple representation of a complex system.
Example 6

<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="Main">
<parallel id="Par">
<state id="ProcessEmail" src="ProcessEmail.scxml"/>
<state id=ControlState" src="ControlState.scxml"/>
<state id="ASRState">
<initial>
<transition>
<target next="Listening">
</transition>
</initial>
<state id="Listening">
<transition event="SpeechDetected">
<target next="Recognizing">
</transition>
</state>
<state id="Recognizing">
<transition event="RecoDone">
<target next="AnalyzeResult">
</transition>
</state>
<state id="AnalyzeResult">
<onentry>
<var name="Result" expr="ProcessASRResult()"/>
</onentry>
<transition cond="Result=Louder">
<target next="Listening">
<send target="scxml" event="IncreaseVolume"/>
</transition>
<transition cond="Result=Stop">
<target next="Listening">
<send target="scxml" event="AbortRead"/>
</transition>
<state/> <!-- ASRState -->
</parallel>
</state> <!-- Main -- >
</scxml>l; |
To conclude the example, it may be useful to compare the SCXML state model with CCXML's. The construct
in CCXML that corresponds most closely to SCXML's <state> is <eventprocessor> since
it holds the transitions and executable content to handle events. CCXML's transitions, however,
do not move to a different <eventprocessor>, so they correspond to a special-case transition
in SCXML, namely a self-transition, which is one with an empty <target>. Such transitions
cause the system to remain in the same state, without executing <onentry> or <onexit>
elements, and thus amount to event handlers. CCXML's
<goto>, which switches to a separate document, does cause the system to switch to a
new <eventprocessor> and is thus most similar to a SCXML <transition> in the general
case. Finally, it is worth pointing out that CCXML's 'statevar' construct, which is used to
condition transitions, is really just another piece of data in the SCXML model, and one that
has no particular connection to the <state> construct. Despite these significant differences
in syntax, CCXML markup can be converted to SCXML automatically, and a XSLT script for this
purpose is included in the SCXML specification.
Conclusion
The full email reader example shows how SCXML can provide a compact and perspicuous
representation of a complex interactive system. It is worth highlighting how naturally
nested states capture task decomposition, while parallel states easily handle interactions
that cross modalities. The <onentry> and <onexit> tags make it easy to ensure
that setup and cleanup happen properly, while the transition selection logic enables us to place
default transitions in parent states that can be overriden by their children. As a result,
SCXML can be used for a variety of purposes. It can be used:
- as a representation of the application-level flow control in VoiceXML (i.e. using
the state machine logic to replace the <goto> between forms).
- as a cross-modality synchronization mechanism in a multimodal interface (the email
reader example could be extended to cover this if we added
a set of GUI input states in parallel to the ASR states).
- as a dialog control mechanism in a language with low-level SALT-style primitives
(the ASR states in the email reader are an example of this)
- as a call control language (i.e., as part of CCXML narrowly defined).
- as a higher-level process control language (the ProcessEmail states are an example
of this).
The interesting thing about this list is that SCXML can be used for both high-level and low-level
tasks and can provide either tight or loose synchronization. It is for this reason that we
have kept the definition of SCXML fairly general, without reference to specific tasks. Much of the
power of state languages lies in the fact that they are a clean mathematical abstraction that
is capable of representing a wide variety of concrete systems. We therefore hope that SCXML will
prove useful in other areas beyond its specific application to CCXML and VoiceXML 3.0.
|