Re: DOM in _event.data and KVPs from Stefan Radomski on 2013-04-01 (www-voice@w3.org from April to June 2013)

From: Stefan Radomski <radomski@tk.informatik.tu-darmstadt.de>
Date: Mon, 1 Apr 2013 10:40:49 +0000
To: Jim Barnett <Jim.Barnett@genesyslab.com>
CC: "www-voice@w3.org" <www-voice@w3.org>
Message-ID: <A2717553-6814-4AF7-BB49-6D791E9D0796@tk.informatik.tu-darmstadt.de>
Hey,

On Apr 1, 2013, at 1:20 AM, Jim Barnett <Jim.Barnett@genesyslab.com> wrote:

> Stefan,
>   For test 561,  you assumption is correct.  The relevant section of the spec is in 5.6.2  "If the 'expr' attribute is not present [on content], the Processor MUST use the children of <content> as the output."  So in this case, the output of <content> is the <book> node, and you would build the DOM for that and put it under _event.data.  If content had multiple children, you would put multiple such nodes under _event.data.  If that's not clear from the spec, could you suggest wording that would make it so?  Would it suffice to say "relevant DOM structure(s)" or do we need something more specific?  

It's just that (to my understanding) the following is no valid XML document:

<?xml version="1.0"?>
<books>
  <book title="title1"/>
  <book title="title2"/>
</books>
<authors />

and you do assume _event.data to be a XML document as you call getElementsByTagName (defined only for "document" per W3C IDLs) on it in test561. This somewhat implies that there ought to be a topmost container element like <content>. I guess I am in favor of this approach as there are no ambiguities of which node type _event.data is when XML is given.

> In 562, you are right that KVPs are key-value pairs, and there are three cases: XML, KVPs or a space-normalized string.  Your implicit question is whether there could be a <content> element whose child/children specified KVPs as opposed to a string (or XML).  I would think that would be possible if the content was unambiguously JSON.  However we say that support for JSON is optional, so if you don't support JSON, the children of <content> are either XML or a string (to be space normalized).  The language making JSON optional is left over from the days when we thought we might make this datamodel mandatory to implement, (and we didn't want to constrain implementations excessively.)  Perhaps it would be better to get rid of the fuzziness.  We could either remove JSON from the datamodel altogether, or make it a mandatory part.  Which would you prefer?  (Of course, whichever one we choose, implementations are free to define their own datamodel that makes the opposite choice. ) 
> 

I am somewhat torn on this. In our interpreter we do, in fact, support JSON as part of content:

  <content>
    ({
      "doubleType": 1.0,
      "int64Type": -4,
      "uint32Type": 5,
      "boolType": false,
      "stringType": 'string',
    })
  </content>

Note the enclosing round brackets of the JSON structure, which are necessary for this to be evaluated as JSON in ECMAScript (at least with Google v8).

As the expr attribute of content is already subject to evaluation by the datamodel and it's somewhat obvious to return JSON for nested ECMAScript data structures in expr, I'd argue that one should be able to provide JSON as a text node in the content element as well, which would supersede the KVPs at least in the ECMAScript datamodel I guess. As to whether or not make to JSON mandatory, I'd say yes when we are talking about the ECMAScript datamodel. We do feature a Prolog datamodel as well and it would be awkward to map JSON onto something Prolog can digest.

With regard to the comments of David:

> I propose this:
> - add an optional 'type' attribute to <content>, which should overrule the following heuristics and facilitate the integration of additional, platform-specific <content> parsers
> 
> - if the XML in <content> is a full Document (i.e. it has exactly one root element, which is the only Element child of <content>), return a Document as per the DOMParser algorithm (use the 'type' value from <content>)
> - else if there are multiple Element children of <content>, return a DocumentFragment with all the children nodes, including text nodes
> - else try interpreting as JSON or whatever
> - if nothing works, return the text content as a string
> 

I am with you on the heuristics but I would not use DocumentFragments in _event.data - if the content is XML, _event.data ought to be a Document with a topmost <content> container element. This would unify all variations for XML in <content> and the DOM itself is a valid <scxml:content> element again. 

What we are doing at the moment is the following:
1. There is a content element as part of invoke or send
1.1 If it has at least one child of type element, import the content node into a new document and present as dom in _event.data
1.2 If there is no element child, take the first child of type text and try to evaluate it via the datamodel
  1.2.1 If it can be successfully evaluated by the datamodel, use representation as a nested structure in _event.data
  1.2.2 If it cannot be evaluated by the datamodel, space normalize and represent as string in _event.data

This is pretty much in line with your heuristics above, with the sole exceptions of never using DocumentFragments. I am somewhat apathetic with a type attribute on content, the only use I can see is to force a text representation of the XML or JSON.

> In any case, do not return something which points to (part of) the SCXML DOM, which means the nodes have to be cloned or re-parsed or removed from their original parent before being put into the new Document or DocumentFragment.

There is no way to tell the XML parser to stop at <content>, so you will necessarily have to import nodes from the original DOM into the new document at _event.data. We do not reparse, but just import and append them into a new document. Having DOM nodes in _event.data that actually point to those from the original SCXML ought to be impossible (at least when _event.data is a document) as there can only be a single ownerDocument.

Best regards
Stefan

> - Jim
> 
> -----Original Message-----
> From: Stefan Radomski [mailto:radomski@tk.informatik.tu-darmstadt.de] 
> Sent: Saturday, March 30, 2013 12:44 PM
> To: www-voice@w3.org
> Subject: DOM in _event.data and KVPs
> 
> Hi there,
> 
> as I am working my way through the tests, I came onto test561, where the content of send is XML. The test assumes that _event.data is a DOM and operates on it:
> 
> <send event="foo">
> <content>
>   <books xmlns="">
>     <book title="title1"/>
>     <book title="title2"/>
>   </books>
> </content>
> </send>
> [...]
> <transition event="foo" cond="_event.data.getElementsByTagName('book')[1].getAttribute('title') == 'title2'" target="pass"/>
> 
> My question is: How does the document in _event.data is supposed to look? At the moment I have:
> 
> <?xml version="1.0"?>
> <books>
> <book title="title1"/>
> <book title="title2"/>
> </books>
> 
> Which assumes that there is only a single element as the child of <content>. The alternative is to introduce a top-level element (e.g. <content>) and have the send contents as its child nodes - which is more generic but unspecified as far as I can tell.
> 
> Also in test562, it is written that "in the ECMA data model, test that processor creates space normalized string in _event.data when receiving anything other than KVPs or XML in an event". The test references C2, which talks about space-normalizing string literals from data elements, but not as part of content. Also, am I to assume that KVPs are key/value pairs? That is, I would have to distinguish three cases with the content in send:
> 
> 1. content has at least one node of type element  -> use the DOM representation 2. content is only a node of type text  -> space-normalize 3. content is a set of key/value pairs  -> provide a hash in _event.data 
> 
> I guess the last case only applies with an empty <content> and the namelist attribute or param elements inside <send>?
> 
> Best regards
> Stefan
> 
> 
>
Received on Monday, 1 April 2013 10:41:20 UTC