RE: XML Syntax Strawman (ACTION-309) from Boley, Harold on 2007-07-13 (public-rif-wg@w3.org from July 2007)

From: Boley, Harold <Harold.Boley@nrc-cnrc.gc.ca>
Date: Fri, 13 Jul 2007 17:15:53 -0400
To: "Sandro Hawke" <sandro@w3.org>
Cc: <public-rif-wg@w3.org>
Message-ID: <E4D07AB09F5F044299333C8D0FEB45E903BA1516@nrccenexb1.nrc.ca>
Please find my anwers inlined.

-- Harold


> > [mailto:public-rif-wg-request@w3.org] On Behalf Of Sandro Hawke
> > >
> > > This is an attempt to progress on ACTION-309 ("Work on unified
> > > strawman proposal for asn->xml system").
> > >
> > > The guiding principle behind the strawman is to have the syntax be
> > > both:
> > >    1.  a basic XML object-serialization syntax and=20
> > >    2.  a subset of RDF/XML.  =20
> > > (I've taken to calling this approach "Semantic XML".)
> > 
> > Even the Abstract Syntax is a syntax. A lot of confusion can arise
> > when associating XML with semantics in that way.
>
> In what way?   Is this a complaint about the name "Semantic XML" or
> about the idea of using a subset of RDF/XML that works with standard XML
> tools?

The above was an observation about syntax vs. semantics: using that name
can lead to lots of confusion.

>
> > > Part 1 means that XML tools will work on it fairly well,
> > 
> > Since XML is most widely accepted in industry, and the charter says
> > "The primary normative syntax of the language must be an XML syntax."
> > <http://www.w3.org/2005/rules/wg/charter.html#xml-syntax>, I suggest
> > to ensure that XML tools will work on it perfectly well.
> > 
> > > and the
> > > format will feel unsurprising to people comfortable with XML. Part 2
> > > means that RDF-reading tools will work on it, the logical data model
> > > of the syntax will be well-defined (good for writing rules about RIF
> > > documents), and we get off-the-shelf solutions to some of the
> > > confusing "coin-flip" issues.
> > 
> > Most of these are not issues in the XML we actually use in WD1.
> > 
> > > It's also more self-describing than is
> > > typical for XML -- it can be de-serialized into frame (generic object)
> > > structures without knowing the schema.
> > 
> > This is also a property of the fully striped XML that we already have.
>
> How do you propose to have the deserializer distinguish between lists
> and repeated values?    You can't de-serialize to frames without knowing
> the difference.   (also datatypes, asked in more detail below.)

Your strawman doesn't mention lists, and we don't have repeated values
such as {ÀÝ, Taiko} in the abstract syntax. A proposed extension for
repeated values or finite domains should be discussed separately.
If found valuable, we can incorporate it into the abstract syntax
and the fully striped XML.

>
> > > The cost of Part 1 is that RDF/XML output tools wont work unless
> > > modified; the cost of Part 2 is that the XML document has a few bits
> > > of RDF syntax in it, making it a little bigger and a little
> > > odd-looking.
> > 
> > And that XML tools will not work on it perfectly well (see above),
>
> I don't see where above.  What XML tools will not work on this?  I said
> "fairly well" because I can't know what all XML tools might do, and how
> well they work is highly subjective.  All plain XML tools will work in
> some sense -- it's well-formed XML -- but particular application tools
> like JAXB make interesting assumptions that might or might not hold.  In
> some cases the assumptions might be tunable.

(XSD) validation and (XSL) transformation would not be supported
by using such a mixture of RIF/XML and RDF/XML:
a problem for rule interchange.

>
> > and that we would always need two namespaces rif and rdf... 
>
> We'll need at least rif, xs (XML Schema Datatypes), fn (XPath
> Functions), and whatever the applications need.  I don't see the
> addition of one more namespace as a problem.

For pure Horn we only need rif.

>
> > To avoid
> > all of that, the WG has converged on the current fully striped XML.
>
> Nothing I'm saying disagrees with fully-striped XML.  It just makes some
> refinements around the edges. 

Good.

>
> > > In informal XML terms, here are the details:
> > >
> > >   1. It's fully-striped object serialization (as Gary and Harold have
> > >      shown already).   The XML elements alternate (as you go deeper =
> > into
> > >      the tree) between being the name of a class and the name of a
> > >      property.=20
> > >
> > >   2. We wrap it all in an rdf:RDF element, which mostly serves to =
> > allow
> > >      multiple rulesets (or other top-level objects) to be serialized =
> > in
> > >      the same XML file (since XML only allows one top-level element).
> > 
> > rif:RIF can do this for us.
>
> I understood the sense of the WG to be that when there was a perfectly
> good term in the rdf namespace, we should use it, rather than copying it
> into a rif namespace.

On a case by case basis, but RIF should have its own root, rif:RIF,
e.g. as in:

<rif:RIF>
  <top><Ruleset>...</Ruleset></top>
  . . .
  <top><Ruleset>...</Ruleset></top>
  . . .
  <top>further top-level RIF object</top>
  . . .
  <top>further top-level RIF object</top>
</rif:RIF>

>
> > >
> > >   3. When serializing a data value (except text strings), we use
> > >      the rdf:datatype attribute to provide the datatype, like this:
> > >         <Animal>
> > >            <age rdf:datatype=3D"&xsd;int">12</age>
> > >            <born rdf:datatype=3D"&xsd;datetime">1995-05-28</born>
> > >         </Animal>
> > >
> > >      (In this example, I'm using a defined XML entity for "xsd" to
> > >      make the string more readable.)
> > 
> > Still quite hard to read. RIF should refer to XSD datatypes directly.
>
> What do you mean "directly"?   What alternative do you suggest?   What
> syntax would you use for the above example?

With "directly" I mean directly based on XSD, as in
xsi:type="&xsd;int" and xsi:type="&xsd;dateTime"
<http://www.w3.org/TR/xmlschema-1/#xsi_type>.

>
> > >   4. For text strings, we just give the value, with an optional
> > >      xml:lang=20
> > >         <Animal>
> > >            <name>Taiko</name>
> > >            <name xml:lang=3D"jp">=C0=DD</name>
> > >         </Animal>
> > 
> > Again, quite hard to read. More in a separate email.
>
> (waiting)
>
> > >   5. If a property has multiple unordered values, just repeat the tag > =
> > as
> > >      often as needed (as immediately above, with two values for the
> > >      "name" property)
> > >
> > >   6. If the value of a propery is a sequence (if the order matters),
> > >      then we have to tell the reader software this, using a special =
> > xml
> > >      attribute, like this: [ Harold, note this difference from the =
> > Core
> > >      draft [1] ]
> > >
> > >          <Uniterm>
> > >             <op><Const>purchase</Const></op>
> > >             <arg rdf:parsetype=3D"Collection">
> > >	          <Var>Buyer</Var>
> > >               <Var>Seller</Var>
> > >               <Uniterm>
> > >                 <op><Const>book</Const></op>
> > >                 <arg rdf:parsetype=3D"Collection">
> > >                    <Var>Author</Var>
> > >                    <Const>LeRif</Const>
> > >                 </arg>
> > >               </Uniterm>
> > >               <Const>$49</Const>
> > >             </arg>
> > >           </Uniterm>
> > 
> > With parsetype=3D"Collection" and a sequence within arg, it's no longer
> > fully striped. Alluding to any parsers to express the high-level
> > distinction "ordered vs. unordered" is not a great idea, besides
> > again being quite hard to read.
>
> Again, I don't see the alternative, if we want deserializers to be able
> to work without a schema.  Without some flag like this, they can't know
> whether to read the data into a hash table or a list.  There are about
> six different ways to do this flag, but I see no real advantage to one
> over the other, except that this happens to be the same as RDF uses, so
> it should be the default unless one of the others has a significant
> advantage.

The deserializer exploits the natural child order of repeated roles
such as of the arg role. The example,

          <Uniterm>
            <op><Const>purchase</Const></op>
            <arg><Var>Buyer</Var></arg>
            <arg><Var>Seller</Var></arg>
            <arg>
              <Uniterm>
                <op><Const>book</Const></op>
                <arg><Var>Author</Var></arg>
                <arg><Const>LeRif</Const></arg>
              </Uniterm>
            </arg>
            <arg><Const>$49</Const></arg>
          </Uniterm>

is read as

          <Uniterm>
            <op><Const>purchase</Const></op>
            <arg index="1"><Var>Buyer</Var></arg>
            <arg index="2"><Var>Seller</Var></arg>
            <arg index="3">
              <Uniterm>
                <op><Const>book</Const></op>
                <arg index="1"><Var>Author</Var></arg>
                <arg index="2"><Const>LeRif</Const></arg>
              </Uniterm>
            </arg>
            <arg index="4"><Const>$49</Const></arg>
          </Uniterm>
Received on Friday, 13 July 2007 21:16:04 UTC