XML Syntax Strawman (ACTION-309)

This is an attempt to progress on ACTION-309 ("Work on unified
strawman proposal for asn->xml system").

The guiding principle behind the strawman is to have the syntax be
both:
   1.  a basic XML object-serialization syntax and 
   2.  a subset of RDF/XML.   
(I've taken to calling this approach "Semantic XML".)

Part 1 means that XML tools will work on it fairly well, and the
format will feel unsurprising to people comfortable with XML.  Part 2
means that RDF-reading tools will work on it, the logical data model
of the syntax will be well-defined (good for writing rules about RIF
documents), and we get off-the-shelf solutions to some of the
confusing "coin-flip" issues.  It's also more self-describing than is
typical for XML -- it can be de-serialized into frame (generic object)
structures without knowing the schema.

The cost of Part 1 is that RDF/XML output tools wont work unless
modified; the cost of Part 2 is that the XML document has a few bits
of RDF syntax in it, making it a little bigger and a little
odd-looking.

In informal XML terms, here are the details:

  1. It's fully-striped object serialization (as Gary and Harold have
     shown already).   The XML elements alternate (as you go deeper into
     the tree) between being the name of a class and the name of a
     property. 

  2. We wrap it all in an rdf:RDF element, which mostly serves to allow
     multiple rulesets (or other top-level objects) to be serialized in
     the same XML file (since XML only allows one top-level element).

  3. When serializing a data value (except text strings), we use
     the rdf:datatype attribute to provide the datatype, like this:
        <Animal>
           <age rdf:datatype="&xsd;int">12</age>
           <born rdf:datatype="&xsd;datetime">1995-05-28</born>
        </Animal>

     (In this example, I'm using a defined XML entity for "xsd" to
     make the string more readable.)

  4. For text strings, we just give the value, with an optional
     xml:lang 
        <Animal>
           <name>Taiko</name>
           <name xml:lang="jp">ÀÝ</name>
        </Animal>

  5. If a property has multiple unordered values, just repeat the tag as
     often as needed (as immediately above, with two values for the
     "name" property)

  6. If the value of a propery is a sequence (if the order matters),
     then we have to tell the reader software this, using a special xml
     attribute, like this: [ Harold, note this difference from the Core
     draft [1] ]

         <Uniterm>
            <op><Const>purchase</Const></op>
            <arg rdf:parsetype="Collection">
       <Var>Buyer</Var>
              <Var>Seller</Var>
              <Uniterm>
                <op><Const>book</Const></op>
                <arg rdf:parsetype="Collection">
                   <Var>Author</Var>
                   <Const>LeRif</Const>
                </arg>
              </Uniterm>
              <Const>$49</Const>
            </arg>
          </Uniterm>

  7. If an object being serialized has a URI, specify it with the
     "rdf:about" attribute, like this:
        <Ruleset rdf:about="http://example.com/myrules#set1">
        ...
        </Ruleset>
  
And I think that's it.

For people familiar with RDF/XML, the subset I'm proposing is obviously
very small.  It's just what you see above.  If the RIF abstract syntax
tree ends up being really a lattice or graph, then we'll add in
rdf:resource and rdf:nodeId.  Also, I'm constraining objects to be
serialized in one place in a document -- the value of rdf:about is not
allowed to occur twice in a file.  (This makes de-serializing and other
kinds of XML processing easier and more efficient, I believe.)  In
general, I'm pretty sure this style will allow schema validation of the
document and processing via XSLT and XQuery.

So, that's the basic idea.  I've been playing with an implementation,
and a more precise specification, but my deadline for this action has
arrived, and this level of detail is probably sufficient to see how
we're doing.  (Or is it?  Does this make sense?  What kind of text or
examples or software would it be helpful?  It's been a long time since
we left off in Innsbruck, Gary and Hassan, and I don't remember exactly
where we were on all the issues.)

     -- Sandro

[1] http://www.w3.org/2005/rules/wg/wiki/Core/Positive_Conditions?action=recall&rev=205

Received on Friday, 13 July 2007 04:01:14 UTC