RE: XML Syntax Strawman (ACTION-309)

Please find my comments inlined.

-- Harold


> -----Original Message-----
> From: public-rif-wg-request@w3.org [mailto:public-rif-wg-request@w3.org] On Behalf Of Sandro Hawke
> Sent: Friday, July 13, 2007 1:00 AM
> To: public-rif-wg@w3.org
> Subject: XML Syntax Strawman (ACTION-309)
>
>
>
> This is an attempt to progress on ACTION-309 ("Work on unified
> strawman proposal for asn->xml system").
>
> The guiding principle behind the strawman is to have the syntax be
> both:
>    1.  a basic XML object-serialization syntax and 
>    2.  a subset of RDF/XML.   
> (I've taken to calling this approach "Semantic XML".)

Even the Abstract Syntax is a syntax. A lot of confusion can arise
when associating XML with semantics in that way.

>
> Part 1 means that XML tools will work on it fairly well,

Since XML is most widely accepted in industry, and the charter says
"The primary normative syntax of the language must be an XML syntax."
<http://www.w3.org/2005/rules/wg/charter.html#xml-syntax>, I suggest
to ensure that XML tools will work on it perfectly well.

> and the
> format will feel unsurprising to people comfortable with XML. Part 2
> means that RDF-reading tools will work on it, the logical data model
> of the syntax will be well-defined (good for writing rules about RIF
> documents), and we get off-the-shelf solutions to some of the
> confusing "coin-flip" issues.

Most of these are not issues in the XML we actually use in WD1.

> It's also more self-describing than is
> typical for XML -- it can be de-serialized into frame (generic object)
> structures without knowing the schema.

This is also a property of the fully striped XML that we already have.

>
> The cost of Part 1 is that RDF/XML output tools wont work unless
> modified; the cost of Part 2 is that the XML document has a few bits
> of RDF syntax in it, making it a little bigger and a little
> odd-looking.

And that XML tools will not work on it perfectly well (see above),
and that we would always need two namespaces rif and rdf... To avoid
all of that, the WG has converged on the current fully striped XML. 

>
> In informal XML terms, here are the details:
>
>   1. It's fully-striped object serialization (as Gary and Harold have
>      shown already).   The XML elements alternate (as you go deeper into
>      the tree) between being the name of a class and the name of a
>      property. 
>
>   2. We wrap it all in an rdf:RDF element, which mostly serves to allow
>      multiple rulesets (or other top-level objects) to be serialized in
>      the same XML file (since XML only allows one top-level element).

rif:RIF can do this for us.

>
>   3. When serializing a data value (except text strings), we use
>      the rdf:datatype attribute to provide the datatype, like this:
>         <Animal>
>            <age rdf:datatype="&xsd;int">12</age>
>            <born rdf:datatype="&xsd;datetime">1995-05-28</born>
>         </Animal>
>
>      (In this example, I'm using a defined XML entity for "xsd" to
>      make the string more readable.)

Still quite hard to read. RIF should refer to XSD datatypes directly.

>
>   4. For text strings, we just give the value, with an optional
>      xml:lang 
>         <Animal>
>            <name>Taiko</name>
>            <name xml:lang="jp">ющ</name>
>         </Animal>

Again, quite hard to read. More in a separate email.
 
>
>   5. If a property has multiple unordered values, just repeat the tag as
>      often as needed (as immediately above, with two values for the
>      "name" property)
>
>   6. If the value of a propery is a sequence (if the order matters),
>      then we have to tell the reader software this, using a special xml
>      attribute, like this: [ Harold, note this difference from the Core
>      draft [1] ]
>
>          <Uniterm>
>             <op><Const>purchase</Const></op>
>             <arg rdf:parsetype="Collection">
>	          <Var>Buyer</Var>
>               <Var>Seller</Var>
>               <Uniterm>
>                 <op><Const>book</Const></op>
>                 <arg rdf:parsetype="Collection">
>                    <Var>Author</Var>
>                    <Const>LeRif</Const>
>                 </arg>
>               </Uniterm>
>               <Const>$49</Const>
>             </arg>
>           </Uniterm>

With parsetype="Collection" and a sequence within arg, it's no longer
fully striped. Alluding to any parsers to express the high-level
distinction "ordered vs. unordered" is not a great idea, besides
again being quite hard to read.

>
>   7. If an object being serialized has a URI, specify it with the
>      "rdf:about" attribute, like this:
>         <Ruleset rdf:about="http://example.com/myrules#set1">
>         ...
>         </Ruleset>
>
> And I think that's it.
>
> For people familiar with RDF/XML, the subset I'm proposing is obviously
> very small.  It's just what you see above.  If the RIF abstract syntax
> tree ends up being really a lattice or graph, then we'll add in
> rdf:resource and rdf:nodeId.  Also, I'm constraining objects to be
> serialized in one place in a document -- the value of rdf:about is not
> allowed to occur twice in a file.  (This makes de-serializing and other
> kinds of XML processing easier and more efficient, I believe.)  In
> general, I'm pretty sure this style will allow schema validation of the
> document and processing via XSLT and XQuery.
>
> So, that's the basic idea.  I've been playing with an implementation,
> and a more precise specification, but my deadline for this action has
> arrived, and this level of detail is probably sufficient to see how
> we're doing.  (Or is it?  Does this make sense?  What kind of text or
> examples or software would it be helpful?  It's been a long time since
> we left off in Innsbruck, Gary and Hassan, and I don't remember exactly
> where we were on all the issues.)
>
>      -- Sandro
>
> [1] http://www.w3.org/2005/rules/wg/wiki/Core/Positive_Conditions?action=recall&> rev=205

Received on Friday, 13 July 2007 17:04:39 UTC