- From: Boley, Harold <Harold.Boley@nrc-cnrc.gc.ca>
- Date: Fri, 13 Jul 2007 17:15:53 -0400
- To: "Sandro Hawke" <sandro@w3.org>
- Cc: <public-rif-wg@w3.org>
Please find my anwers inlined. -- Harold > > [mailto:public-rif-wg-request@w3.org] On Behalf Of Sandro Hawke > > > > > > This is an attempt to progress on ACTION-309 ("Work on unified > > > strawman proposal for asn->xml system"). > > > > > > The guiding principle behind the strawman is to have the syntax be > > > both: > > > 1. a basic XML object-serialization syntax and=20 > > > 2. a subset of RDF/XML. =20 > > > (I've taken to calling this approach "Semantic XML".) > > > > Even the Abstract Syntax is a syntax. A lot of confusion can arise > > when associating XML with semantics in that way. > > In what way? Is this a complaint about the name "Semantic XML" or > about the idea of using a subset of RDF/XML that works with standard XML > tools? The above was an observation about syntax vs. semantics: using that name can lead to lots of confusion. > > > > Part 1 means that XML tools will work on it fairly well, > > > > Since XML is most widely accepted in industry, and the charter says > > "The primary normative syntax of the language must be an XML syntax." > > <http://www.w3.org/2005/rules/wg/charter.html#xml-syntax>, I suggest > > to ensure that XML tools will work on it perfectly well. > > > > > and the > > > format will feel unsurprising to people comfortable with XML. Part 2 > > > means that RDF-reading tools will work on it, the logical data model > > > of the syntax will be well-defined (good for writing rules about RIF > > > documents), and we get off-the-shelf solutions to some of the > > > confusing "coin-flip" issues. > > > > Most of these are not issues in the XML we actually use in WD1. > > > > > It's also more self-describing than is > > > typical for XML -- it can be de-serialized into frame (generic object) > > > structures without knowing the schema. > > > > This is also a property of the fully striped XML that we already have. > > How do you propose to have the deserializer distinguish between lists > and repeated values? You can't de-serialize to frames without knowing > the difference. (also datatypes, asked in more detail below.) Your strawman doesn't mention lists, and we don't have repeated values such as {ющ, Taiko} in the abstract syntax. A proposed extension for repeated values or finite domains should be discussed separately. If found valuable, we can incorporate it into the abstract syntax and the fully striped XML. > > > > The cost of Part 1 is that RDF/XML output tools wont work unless > > > modified; the cost of Part 2 is that the XML document has a few bits > > > of RDF syntax in it, making it a little bigger and a little > > > odd-looking. > > > > And that XML tools will not work on it perfectly well (see above), > > I don't see where above. What XML tools will not work on this? I said > "fairly well" because I can't know what all XML tools might do, and how > well they work is highly subjective. All plain XML tools will work in > some sense -- it's well-formed XML -- but particular application tools > like JAXB make interesting assumptions that might or might not hold. In > some cases the assumptions might be tunable. (XSD) validation and (XSL) transformation would not be supported by using such a mixture of RIF/XML and RDF/XML: a problem for rule interchange. > > > and that we would always need two namespaces rif and rdf... > > We'll need at least rif, xs (XML Schema Datatypes), fn (XPath > Functions), and whatever the applications need. I don't see the > addition of one more namespace as a problem. For pure Horn we only need rif. > > > To avoid > > all of that, the WG has converged on the current fully striped XML. > > Nothing I'm saying disagrees with fully-striped XML. It just makes some > refinements around the edges. Good. > > > > In informal XML terms, here are the details: > > > > > > 1. It's fully-striped object serialization (as Gary and Harold have > > > shown already). The XML elements alternate (as you go deeper = > > into > > > the tree) between being the name of a class and the name of a > > > property.=20 > > > > > > 2. We wrap it all in an rdf:RDF element, which mostly serves to = > > allow > > > multiple rulesets (or other top-level objects) to be serialized = > > in > > > the same XML file (since XML only allows one top-level element). > > > > rif:RIF can do this for us. > > I understood the sense of the WG to be that when there was a perfectly > good term in the rdf namespace, we should use it, rather than copying it > into a rif namespace. On a case by case basis, but RIF should have its own root, rif:RIF, e.g. as in: <rif:RIF> <top><Ruleset>...</Ruleset></top> . . . <top><Ruleset>...</Ruleset></top> . . . <top>further top-level RIF object</top> . . . <top>further top-level RIF object</top> </rif:RIF> > > > > > > > 3. When serializing a data value (except text strings), we use > > > the rdf:datatype attribute to provide the datatype, like this: > > > <Animal> > > > <age rdf:datatype=3D"&xsd;int">12</age> > > > <born rdf:datatype=3D"&xsd;datetime">1995-05-28</born> > > > </Animal> > > > > > > (In this example, I'm using a defined XML entity for "xsd" to > > > make the string more readable.) > > > > Still quite hard to read. RIF should refer to XSD datatypes directly. > > What do you mean "directly"? What alternative do you suggest? What > syntax would you use for the above example? With "directly" I mean directly based on XSD, as in xsi:type="&xsd;int" and xsi:type="&xsd;dateTime" <http://www.w3.org/TR/xmlschema-1/#xsi_type>. > > > > 4. For text strings, we just give the value, with an optional > > > xml:lang=20 > > > <Animal> > > > <name>Taiko</name> > > > <name xml:lang=3D"jp">=C0=DD</name> > > > </Animal> > > > > Again, quite hard to read. More in a separate email. > > (waiting) > > > > 5. If a property has multiple unordered values, just repeat the tag > = > > as > > > often as needed (as immediately above, with two values for the > > > "name" property) > > > > > > 6. If the value of a propery is a sequence (if the order matters), > > > then we have to tell the reader software this, using a special = > > xml > > > attribute, like this: [ Harold, note this difference from the = > > Core > > > draft [1] ] > > > > > > <Uniterm> > > > <op><Const>purchase</Const></op> > > > <arg rdf:parsetype=3D"Collection"> > > > <Var>Buyer</Var> > > > <Var>Seller</Var> > > > <Uniterm> > > > <op><Const>book</Const></op> > > > <arg rdf:parsetype=3D"Collection"> > > > <Var>Author</Var> > > > <Const>LeRif</Const> > > > </arg> > > > </Uniterm> > > > <Const>$49</Const> > > > </arg> > > > </Uniterm> > > > > With parsetype=3D"Collection" and a sequence within arg, it's no longer > > fully striped. Alluding to any parsers to express the high-level > > distinction "ordered vs. unordered" is not a great idea, besides > > again being quite hard to read. > > Again, I don't see the alternative, if we want deserializers to be able > to work without a schema. Without some flag like this, they can't know > whether to read the data into a hash table or a list. There are about > six different ways to do this flag, but I see no real advantage to one > over the other, except that this happens to be the same as RDF uses, so > it should be the default unless one of the others has a significant > advantage. The deserializer exploits the natural child order of repeated roles such as of the arg role. The example, <Uniterm> <op><Const>purchase</Const></op> <arg><Var>Buyer</Var></arg> <arg><Var>Seller</Var></arg> <arg> <Uniterm> <op><Const>book</Const></op> <arg><Var>Author</Var></arg> <arg><Const>LeRif</Const></arg> </Uniterm> </arg> <arg><Const>$49</Const></arg> </Uniterm> is read as <Uniterm> <op><Const>purchase</Const></op> <arg index="1"><Var>Buyer</Var></arg> <arg index="2"><Var>Seller</Var></arg> <arg index="3"> <Uniterm> <op><Const>book</Const></op> <arg index="1"><Var>Author</Var></arg> <arg index="2"><Const>LeRif</Const></arg> </Uniterm> </arg> <arg index="4"><Const>$49</Const></arg> </Uniterm>
Received on Friday, 13 July 2007 21:16:04 UTC