SCXML: Comments on the new draft [LONG] from Torbjörn Lager on 2007-03-06 (www-voice@w3.org from January to March 2007)

From: Torbjörn Lager <torbjorn.lager@gmail.com>
Date: Tue, 6 Mar 2007 21:16:05 +0100
To: www-voice@w3.org
Message-ID: <6ad4bdae0703061216h776b99eo66153441f76500d0@mail.gmail.com>
Dear All,

Based on my thoughts reading the new SCXML draft, and on my experiences
implementing most of it (see http://www.ling.gu.se/~lager/Labs/SCXML-Lab/ -
now in a new version!), I have put together a fairly long list of
comments/questions/recommendations that I would like to share with you.

Most of my comments/recommendations are motivated by a desire for
expressivity, simplicity, brevity and efficiency. I have no idea what is
required for backwards compatibility with for example CCXML - I leave that
for others to work on.

Best regards,
Torbjörn Lager


1. The algorithm
----------------

1.1 The algorithm seems to work (I have implemented most of it - found just
one error so far, which have reported), and let me begin by expressing my
admiration for the people who are able to design something that complex that
actually works. An algorithm that works is certainly a major step forward.

1.2 However, the complexity of the algorithm may also be its weakness since
it is supposed to provide the semantics of SCXML - "Implementations are free
to implement SCXML intepreters in any way they choose, but they must behave
as *if* they were using the algorithm defined here." (Section C - first
paragraph). Well, to implement something behaving as *if* it was using that
algorithm, using *another* algorithm, would be almost impossible I believe.
Of course, there are bits and pieces in the algorithm that can (and should)
be replaced with more efficient alternatives in actual implementations (e.g.
for computing the LCA), but more radical changes, such as running each child
of <parallel> in a separate thread won't work (or at least won't make sense,
since one would have to do so much synchronization of the threads that the
point of using concurrency would be lost). It would be nice if it was
possible to come up with a precise yet user-friendly specification of the
semantics of (in particular) <parallel> which really did allow the use of
different algorithms - execute the transitions in "lockstep" (as in the
present algorithm), one at a time (cf section B.2), or in separate threads
(where the order of execution will be arbitrary and beyond the control of
the user). But I realize that a formal specification may not be possible,
and I know too little about UML to say if it has been attempted already.

1.3 Would it be possible to settle for a more 'indeterminate' semantics, to
tolerate certain subtle differences between different implementations, and
instead educate developers to never rely on such differences?

1.4 What can be said about the computational complexity of the algorithm?
Efficiency may not be that important in the majority of SCXML use cases, but
in others it might. (An example might be the use of SCXML as a game engine
component.) I get the impression that the Remove Conflicting Transitions
Phase in particular is slow (However, could it be that that phase could
somehow be preprocessed offline, or cached online?)


2. States and Pseudostates
--------------------------

2.1 I don't think it makes sense to allow transitions into <initial>! (In
other words, <initial> should not be considered a 'transition target'.)
Reasons: 1) As far as I know (but I may be wrong), UML diagrams do not allow
that, 2) it does not seem to increase expressivity, and 3) if we don't allow
it, there is no need for <inital> to have an ID, which is a good thing...

2.2 Allow for <transition> as children of <parallel>. I see no reason for
not allowing that.

2.3 Since two <final> states can differ only in ID (since they have no
children) it doesn't make much sense to allow more than one <final> as a
child of <scxml>.

2.4 I don't think a <final> as a child of <scxml> should be required. Some
systems just don't need to ever terminate!

2.5 I think <state id="s" final="true"/> is better than <final id="s"/>.
(See next point for one reason).

2.6 I (still!) think you should consider making <scxml> basically equivalent
to a <state> except that it doesn't have an ID (the ID is provided by the
'caller'). Thus <scxml> would allow the same children that a state does
(<onentry>, <onexit>, <transition>, <invoke> etc.) I see no harm in that,
and there are advantages. For example, the "Hello World"-example would be
very short:

    <scxml final="true">
        <onentry>
            <log expr="'Hello world!'"/>
        </onentry>
    </scxml>

As things now stands, it has to be written as follows:

    <scxml initialstate="s0">
        <state id="s0">
            <onentry>
                <log expr="'Hello world!'"/>
            </onentry>
            <transition target="s"/>
        </state>
        <final id="s"/>
    </scxml>

2.7 I would (still!) prefer that you got rid of the <initial> element in
favor of a new (optional) attribute (named 'initial', 'initialstate' or
'target' with value of type namelist) for <state>. Make it incompatible with
'final'. Use the same attribute for <history> as well. (This is related to
point 2.1 above.) As a consequence, we would be able to write

    <state id="s0" initial="s1">
        <state id="s1" ... />
        <state id="s2" ... />
    </state>

instead of the more verbose

    <state id="s0">
        <initial>
            <transition target="s1" >
        </initial>
        <state id="s1" ... />
        <state id="s2" ... />
    </state>

I appreciate that having actions in the <transition> may be useful - I just
don't feel it's worth it!

2.8 If you decide to stick to the <initial> element, would it not (in the
same spirit) make sense to require that the <scxml> element too has an
<initial> child, rather than using the 'initialstate' attribute? The "Hello
World"-example would end up like so:

    <scxml>
        <initial>
            <transition target="s0"/>
        </initial>
        <state id="s0">
            <onentry>
                <log expr="'Hello world!'"/>
            </onentry>
            <transition target="s"/>
        </state>
        <final id="s"/>
    </scxml>

Again, contrast that with

    <scxml final="true">
        <onentry>
            <log expr="'Hello world!'"/>
        </onentry>
    </scxml>

I think of this as an argument AGAINST <initial>, and FOR the use of
attributes!


3. Events, eventdata etc.
-------------------------

3.1 Since the event attribute of <transition> allows the use of the '*'
wildcard, there seems to be a need for an _eventname variable in addition to
the _eventdata variable, i.e. a variable instantiated to the name of the
event that was matched. (Alternatively, there could be just one variable
_event with two fields 'name' and 'data', allowing uses of _event.name and
_event.data. I think I would prefer that solution...)

3.2 Perhaps there is need for more auto generated internal events, in
addition to the "*.Done" events? What if every transition generated an event
"transition" with data fields indicating (at least) source and target? This
would make it very simple to have one statemachine S1 monitoring another
statemachine S2 (where S1 and S2 are running in parallel), e.g. for
debugging purposes. In the simplest case, S1 could be

 <state id="monitor">
     <transition event="transition"
                 cond="_eventdata.source != 'monitor'"
                 target="monitor">
         <log expr="'Transitioned from' + _eventdata.source +
                    'to' _eventdata.target"/>
     </transition>
 </state>

Such a monitoring mechanism could prove very useful. Apart from debugging,
it would also make it possible for S1 to modify the datamodel of S2, and
thus influence the way in which S2 is running. Perhaps this could even be a
workable approach to run-time adaptation of system behaviour?

3.3 Section 10 seems to suggest that wheras "error.*" is a permissible value
of the <transition> 'event' attribute, "*.done" is not. (I quote: "The
"event" attribute of a transition may end with the wildcard '.*', which will
match zero or more tokens at the end of the processed event's name.") We
want to allow "*.done" too - don't we?


4. IDs - Mandatory or optional?
-------------------------------

4.1 IDs for <initial> should not be mandatory since these IDs are seldom
useful. (See also point 2.1 and point 2.7.)

4.2 On the other hand IDs for <transition> could sometimes be useful, i.e.
for debugging purposes, in particular for targetless transitions. My
recommendation would be to make them optional.

4.3 Perhaps *all* IDs could be made optional, with a machine generated
default? After all, if a state is never transitioned to, or if the developer
is not interested in its *.Done event, then it appears that an explicit
(user defined) ID is not required for it?


5. Executable content and datamodel
-----------------------------------

5.1 Should this be possible? (I think yes!):

    <datamodel>
        <data name="a" expr="2"/>
        <data name="b" expr="/data[@name='a']+2"/>
    </datamodel>

    or this?

    <datamodel>
        <data name="b" expr="/[@name='a']+2"/>
        <data name="a" expr="2"/>
    </datamodel>

    then we have to know what to do about attempted uses such as this:

    <datamodel>
        <data name="b" expr="/data[@name='a']"/>
        <data name="a" expr="/data[@name='b']"/>
    </datamodel>

5.2 Realizing that setting the value of a simple variable 'a' would need
syntax like the following

    <assign location="/data[@name='a']" expr="3"/>

makes me want to introduce the 'name' attribute of <assign> back in again,
making the following alternative possible:

    <assign name="a" expr="3"/>

5.3 Since "expression" and "condition" are abbreviated as "expr" and "cond"
respectively, perhaps "location" could be abbreviated as "loc"?

5.4 The are currently two elements, <event> and <send>, devoted to the task
of sending events. The only difference is in which queue the event is placed
- the internal queue or the external. Perhaps <event name="e"/> could
instead be expressed as e.g. <send event="e" kind="internal"/>? (The default
value of 'kind' could be "external".) In any case, a noun such as "event" is
not a good name for an element that is executable content!

5.5 Shouldn't it be possible to send an internal event with a certain data
payload? Currently, <event> does not allow that. Why not? (Hmm, maybe you
think that the datamodel could be used as intermediate storage when
communicating data between parallel states. That may be true given the
current algorithm, but if parallel states ran in separate threads, that
could lead to tricky synchronization problems. Using the event queue for
that purpose would be safer. Hmm, or perhaps you would argue that <send>
could be used to that purpose?)

5.6 We probably don't want to allow <event> or <send> to send events of the
form "*.Done".

5.7 What happened to the <cancel> event? It seems that such an element might
be useful, and it may be difficult to introduce as a custom action element.
Received on Tuesday, 6 March 2007 20:16:14 UTC