RE: XML Syntax Strawman (ACTION-309) from Sandro Hawke on 2007-07-13 (public-rif-wg@w3.org from July 2007)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 13 Jul 2007 14:14:15 -0400
To: "Boley, Harold" <Harold.Boley@nrc-cnrc.gc.ca>
Cc: "Sandro Hawke" <sandro@w3.org>, public-rif-wg@w3.org
Message-ID: <18238.1184350455@ubuhebe>
"Boley, Harold" <Harold.Boley@nrc-cnrc.gc.ca> writes:
>
> [mailto:public-rif-wg-request@w3.org] On Behalf Of Sandro Hawke
> >
> > This is an attempt to progress on ACTION-309 ("Work on unified
> > strawman proposal for asn->xml system").
> >
> > The guiding principle behind the strawman is to have the syntax be
> > both:
> >    1.  a basic XML object-serialization syntax and=20
> >    2.  a subset of RDF/XML.  =20
> > (I've taken to calling this approach "Semantic XML".)
> 
> Even the Abstract Syntax is a syntax. A lot of confusion can arise
> when associating XML with semantics in that way.

In what way?   Is this a complaint about the name "Semantic XML" or
about the idea of using a subset of RDF/XML that works with standard XML
tools?

> > Part 1 means that XML tools will work on it fairly well,
> 
> Since XML is most widely accepted in industry, and the charter says
> "The primary normative syntax of the language must be an XML syntax."
> <http://www.w3.org/2005/rules/wg/charter.html#xml-syntax>, I suggest
> to ensure that XML tools will work on it perfectly well.
> 
> > and the
> > format will feel unsurprising to people comfortable with XML. Part 2
> > means that RDF-reading tools will work on it, the logical data model
> > of the syntax will be well-defined (good for writing rules about RIF
> > documents), and we get off-the-shelf solutions to some of the
> > confusing "coin-flip" issues.
> 
> Most of these are not issues in the XML we actually use in WD1.
> 
> > It's also more self-describing than is
> > typical for XML -- it can be de-serialized into frame (generic object)
> > structures without knowing the schema.
> 
> This is also a property of the fully striped XML that we already have.

How do you propose to have the deserializer distinguish between lists
and repeated values?    You can't de-serialize to frames without knowing
the difference.   (also datatypes, asked in more detail below.)

> > The cost of Part 1 is that RDF/XML output tools wont work unless
> > modified; the cost of Part 2 is that the XML document has a few bits
> > of RDF syntax in it, making it a little bigger and a little
> > odd-looking.
> 
> And that XML tools will not work on it perfectly well (see above),

I don't see where above.  What XML tools will not work on this?  I said
"fairly well" because I can't know what all XML tools might do, and how
well they work is highly subjective.  All plain XML tools will work in
some sense -- it's well-formed XML -- but particular application tools
like JAXB make interesting assumptions that might or might not hold.  In
some cases the assumptions might be tunable.

> and that we would always need two namespaces rif and rdf... 

We'll need at least rif, xs (XML Schema Datatypes), fn (XPath
Functions), and whatever the applications need.  I don't see the
addition of one more namespace as a problem.

> To avoid
> all of that, the WG has converged on the current fully striped XML.

Nothing I'm saying disagrees with fully-striped XML.  It just makes some
refinements around the edges. 

> > In informal XML terms, here are the details:
> >
> >   1. It's fully-striped object serialization (as Gary and Harold have
> >      shown already).   The XML elements alternate (as you go deeper =
> into
> >      the tree) between being the name of a class and the name of a
> >      property.=20
> >
> >   2. We wrap it all in an rdf:RDF element, which mostly serves to =
> allow
> >      multiple rulesets (or other top-level objects) to be serialized =
> in
> >      the same XML file (since XML only allows one top-level element).
> 
> rif:RIF can do this for us.

I understood the sense of the WG to be that when there was a perfectly
good term in the rdf namespace, we should use it, rather than copying it
into a rif namespace.

> >
> >   3. When serializing a data value (except text strings), we use
> >      the rdf:datatype attribute to provide the datatype, like this:
> >         <Animal>
> >            <age rdf:datatype=3D"&xsd;int">12</age>
> >            <born rdf:datatype=3D"&xsd;datetime">1995-05-28</born>
> >         </Animal>
> >
> >      (In this example, I'm using a defined XML entity for "xsd" to
> >      make the string more readable.)
> 
> Still quite hard to read. RIF should refer to XSD datatypes directly.

What do you mean "directly"?   What alternative do you suggest?   What
syntax would you use for the above example?

> >   4. For text strings, we just give the value, with an optional
> >      xml:lang=20
> >         <Animal>
> >            <name>Taiko</name>
> >            <name xml:lang=3D"jp">=C0=DD</name>
> >         </Animal>
> 
> Again, quite hard to read. More in a separate email.

(waiting)

> >   5. If a property has multiple unordered values, just repeat the tag =
> as
> >      often as needed (as immediately above, with two values for the
> >      "name" property)
> >
> >   6. If the value of a propery is a sequence (if the order matters),
> >      then we have to tell the reader software this, using a special =
> xml
> >      attribute, like this: [ Harold, note this difference from the =
> Core
> >      draft [1] ]
> >
> >          <Uniterm>
> >             <op><Const>purchase</Const></op>
> >             <arg rdf:parsetype=3D"Collection">
> >	          <Var>Buyer</Var>
> >               <Var>Seller</Var>
> >               <Uniterm>
> >                 <op><Const>book</Const></op>
> >                 <arg rdf:parsetype=3D"Collection">
> >                    <Var>Author</Var>
> >                    <Const>LeRif</Const>
> >                 </arg>
> >               </Uniterm>
> >               <Const>$49</Const>
> >             </arg>
> >           </Uniterm>
> 
> With parsetype=3D"Collection" and a sequence within arg, it's no longer
> fully striped. Alluding to any parsers to express the high-level
> distinction "ordered vs. unordered" is not a great idea, besides
> again being quite hard to read.

Again, I don't see the alternative, if we want deserializers to be able
to work without a schema.  Without some flag like this, they can't know
whether to read the data into a hash table or a list.  There are about
six different ways to do this flag, but I see no real advantage to one
over the other, except that this happens to be the same as RDF uses, so
it should be the default unless one of the others has a significant
advantage.

     -- Sandro
Received on Friday, 13 July 2007 18:15:33 UTC