Re: XML-Data (mapping object-style XML to frames) from Christian De Sainte Marie on 2010-09-08 (public-rif-wg@w3.org from September 2010)

From: Christian De Sainte Marie <csma@fr.ibm.com>
Date: Wed, 8 Sep 2010 14:09:33 +0200
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Dave Reynolds <dave.e.reynolds@gmail.com>, "Eric Prud'hommeaux" <ericw3c@gmail.com>, public-rif-wg <public-rif-wg@w3.org>, Sandro Hawke <sandro@w3.org>
Message-ID: <OF7C40B89C.C16DBD0D-ONC1257798.0035F33E-C1257798.0042CBC5@fr.ibm.com>
Hi Sandro, Eric and Dave,

Thanx for the input.

See discussion in-lined, below.

"Eric Prud'hommeaux" <ericw3c@gmail.com> wrote on 07/09/2010 23:41:28:
> 
> Dave Reynolds, Christian De Sainte Marie, public-rif-wg
> 
> * Sandro Hawke <sandro@w3.org> [2010-09-07 17:36-0400]
> > On Tue, 2010-09-07 at 22:11 +0100, Dave Reynolds wrote:
> > > On Tue, 2010-09-07 at 16:51 -0400, Sandro Hawke wrote: 
> > > > [...]
> > > > 
> > > > 1. Attribute and elements are mapped to IRIs in the same
> > > >    way, so you can't distinguish between them.  We suggest
> > > >    there are very few practical cases where you need to
> > > >    distinguish.  (And things are much nicer this way.)
> > > >    (If you need to translate to some construct (eg in jrules) that
> > > >    does distinguish, you can turn it into an OR of the two
> > > >    forms.)

I assume that the issue you address, here, is not the distinction between 
attribute and sub-element, but the awkwardness of the 'attribute(...)' 
form in the syntax proposed for attributes in the current version of 
RIF+XML data, that is:

<namespace#attribute(localname)>,

e.g., ?x[<http://www.w3.org/ns/none#attribute(tel)>->"x531"], in the 
no-namespace case, in your example.

We could easily build different IRIs to denote sub-elements and 
attributes, without using that form, e.g.
- <namespace#localname> for sub-elements, and
- <namespace@localname> for attributes (or any other character that cannot 
occur in an XML name or NCName; although they have to be allowed in IRIs; 
I would have to check for '@').

And we can extend the system to handle the two other cases, namely lists 
and types, for which the currently proposed syntax is:
- <namespace#type(localname)> for types, and
- <namespace#list(localname)> for lists.

But the limitation is, really, that a rif:iri has to be absolute: 
otherwise, we could, probably, use XPath or component designator 
syntaxes... And, now that I think of it, xs:anyURIs _can_ be relative, and 
xs:anyURI _is_ a RIF builtin data type... Hmmm, could it be that simple? I 
will check.

If the above is beside the point, and if it is, really, the distinction 
between attributes and sub-elements that you wanted to address: can you 
elaborate, please? I do not see what is the problem. Of course, same name 
attributes and sub-elements are a corner case, and (to augment the 
statistics of Eric) none of the people I asked has ever seen it.

But is that really the question? As far as I can see, the question is more 
one of ease of implementation: if I (producer of the RIF document) know if 
the information my rule is about is in an attribute or a sub-element, why 
should I force the consumer to look in both places? Plus, from a 
specification point of view, making the distinction is easier, since 
sub-elements and attributes are in different subtsets of the infoset.

Or did I miss something?

> > > Is there any evidence this is not a practical issue?
> > > 
> > > I thought XML folk were quite concerned about attribute/element
> > > distinction.  Certainly one use case for XML-data in RIF is to be 
able
> > > do "lift" of XML to RDF using rules and the existing XML->RDF lift
> > > systems I known take some care to avoid overloading 
element/attribute
> > > namespaces.

Dave, can you point me to some of these, please?

As a side comment, there is also the relation with RDF/XML, that I wanted 
to explore (but I am unlikely to have the time befor ethe end of the 
month).

> > I understood this spec to be addressing OO-style XML, and as such it 
is
> > not trying to preserve (nearly) all the infoset.   Most remarkably, it
> > seems (on a very cursory inspection) to not care about the order of 
the
> > children.

Well, actually, the spec attempts to address any kind of XML, OO-style or 
otherwise.

And it takes care of the order of the children, too (using the 
<...#list(...)> syntax).
 
> > But I don't know the actual use cases here at all; it seems to me this
> > is being driven by Christian's use cases, which I don't know.

That is the really the oddest part to me; I mean, that the use case can 
seem mysterious to anyone!

Rules are written to be applied to data, right? So, except for the corner 
case where the data is exchanged as RIF facts, RIF producers and consumers 
need to agree on the binding to the target data: I do not see how the rule 
can be meaningfully interchanged, otherwise.

We did that for RDF and OWL data, but a lot of the data available to 
applications and exchanged between applications is in XML, or can be 
serialized in XML based on DTDs or XSDs, that can, also, be exchanged 
using standard formats; not to mention the ubiquitous XML tooling.

So, if we want to specify a standard binding for RIF formulas to 
(non-RDF/OWL) data, choosing to specify a binding to XML, arguably 
currently the most universal reference format available, seems quite 
natural (to me, at least).

Now, of course, we have our own use cases, but they are not any different 
from that (except that we are mostly interested in the OO case, of 
course): if you publish or send me rules, e.g. dealing with insurance 
claims, and if you tell me that there RIF representation is meant for data 
represented in XML according to the ACORD standard format, and there is a 
standard binding for RIF formulas to XML data, then I know what to do with 
the rules, applying them to the data you tell me to, or my own, or 
whatever, depending on the application, the context of the interchange 
etc. And you do not have to know or care about how I represent the data in 
my application; and reciprocally.

On the other hand, if you send me the rules without that kind of 
information, of course, I can do all kind of creative things with them, 
binding them as I want to whatever data I choose. But I cannot know 
unambiguously what you intended the rules for. Notice, by the way, that 
the existence of a standard binding for RIF formulas to XML will not stop 
me to do all kind of creative things with your rules. But that will not be 
your, nor RIF's, fault :-)

All the above seems extremely obvious to me, which is why statements like 
Sandro's, above, about not understanding the use case, scare me a little 
bit: could it be that I am just plain wrong?

> > I'm motivated by the idea of making it simple enough that I wouldn't
> > mind using it, for those XML cases where it applies.   I expect for 
the
> > rest of XML to use something that explicitly covers the whole infoset.

Can you elaborate on that, please? I do not understand your point.

The current draft is explicitly using the infoset (and post-schema 
validation infoset, in the case where the combination includes a schema). 
Of course, it does not cover the whole infoset, e.g. it does not cover the 
processing instruction or comment information items, or the parent 
property, because that did not seem a priority to me; but it can be very 
easily extended to cover them.

My motivation for leaving things out is, indeed, to keep it simple enough. 
But what is the best compromise between simplicity and coverage remains an 
open question, of course.

> > > > 2. The IRI is constructed by simply concatenating the namespace 
and
> > > >    the local part of the name, so you get the slightly odd looking
> > > >    http://example.org/2bday and http://example/1tel.  Since people
> > > >    wont usually see these, it should not be a problem.

What is the benefit of not including a separation character? I would think 
that it only makes implementation more difficult?

> > > > 3. Attributes with no namespace are treated as if they had the
> > > >    namespace of their element.  (Again, we lose some ability to
> > > >    distinguish between certainl XML documents, but it should be 
fine.)

Yep. That is one of the option. Some specs do that. But, hopefully, we 
will not need to, if we can use xs:anyURI instead of rif:iri.

> > > > 4. Elements with no namespace are treated as if they had the 
namespace
> > > >    "http://www.w3.org/ns/none#".
> > > > 
> > > > This seems to us to be pretty easy to use, about the same 
implementation
> > > > difficulty, and only excluding a few XML documents that would 
otherwise
> > > > be processable.   (You were already excluding the ones where order
> > > > matter, right?  If not, we'd need to bring in lists.)

No, cases where order matters are already taken care of in the current 
draft (using lists, indeed).

Now, I will go look into whether and how we could use XPath or XML 
component designator or such syntaxes as xs:anyURIs... Wow! How could I 
miss that?

Cheers,

Christian

IBM
9 rue de Verdun
94253 - Gentilly cedex - FRANCE
Tel./Fax: +33 1 49 08 29 81


Sauf indication contraire ci-dessus:/ Unless stated otherwise above:
Compagnie IBM France
Siege Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 612.509.964 ?
SIREN/SIRET : 552 118 465 03644
Received on Wednesday, 8 September 2010 12:10:10 UTC