W3C home > Mailing lists > Public > www-rdf-interest@w3.org > September 2001

Philosophical question about ID and bagID... (should xml:base be manditory?)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 19 Sep 2001 16:34:49 +0300
Message-ID: <2BF0AD29BC31FE46B78877321144043114BFD0@trebe003.NOE.Nokia.com>
To: www-rdf-comments@w3.org, www-rdf-interest@w3.org

I have a philosophical question about the existence of
the ID and bagID attributes in RDF serializations which
have no xml:base URI defined.

Since RDF statements are (as far as I understand) supposed
to be independent of any XML serialization and common
perception (or at least mine) is that any particular
knowledge base made up of RDF statements can be the
result of syndicating multiple sources of knowledge 
together -- does not the localized nature of XML IDs
somehow conflict with the "global" trans-instance view
of knowledge at the heart of RDF?

Should not statements made within any instance remain
valid and unambiguous irrespective of serialization? or
whether that knowledge is embodied in one or many
individual serializations?

E.g. consider the following two RDF instances:

<rdf:RDF ...>
   <rdf:Description rdf:about="urn:foo:bar" rdf:bagID="bag">
      <dc:title>The Tao of Foo</dc:title>
   </rdf:Description>
   <rdf:Description rdf:about="#bag">
      <x:authority rdf:resource="mailto:stan.the.man@booga.net"/>
   </rdf:Description>
</rdf:RDF>

<rdf:RDF ...>
   <rdf:Description rdf:about="urn:foo:bar" rdf:bagID="bag">
      <dc:title>The Tao of Foo, the Te of Bar</dc:title>
   </rdf:Description>
   <rdf:Description rdf:about="#bag">
      <x:authority rdf:resource="mailto:zorro@swords-r-us.com"/>
   </rdf:Description>
</rdf:RDF>

Now, if we syndicate the knowledge from these two separate
instances, we have a problem -- because both instances use
(in a perfectly valid way) the ID 'bag' to reify and group
statements and collectively associate additional information 
about the source of authority for those statements. Note that 
in fact the two statements are conflicting with regards to the DC
title of a given resource. Unfortunately, in our resultant
merged graph, we get *both* sources being defined as the
authority of both bags of statements because the IDs from
the two separate XML instances collide. I.e. the graph is both ambiguous
as well as erroneous in that it says that that both sources
made both statements and we are unnable to differentiate between
those sources.

Clearly, that's not a good thing.

Granted, one could use aboutEach to partially avoid the problem,
achieving a separation of authority statements correctly between 
the two bags of statements, but the two (serialized) bag nodes still 
become one node in the graph since they have they same ID.

And one cannot argue that the content creators should simply
have been more careful in selecting their IDs, as because
one cannot use URIs, one cannot in fact insure that there
will never be a collision. Though it is perhaps fair to 
presume that folks using IDs for rdf:Description elements
would not expect to safely syndicate knowledge, it may be
far less apparent that bagID might be similarly unsafe.

It seems to me that, given the trans-instance nature of
RDF that allowing ID and bagID values in the absence of
an xml:base URi is a bad idea and should be disallowed. 

The RDF spec seems to state (someone correct me if I'm wrong)
that all relative URIs in the XML serialization must be resolved
to absolute URIs as part of the parsing process, and that URIs
in the graph should all be absolute. Since RDF doesn't explicitly 
require that an xml:base URI value be defined for every instance that
defines local IDs and bagIDs, one cannot resolve from relative to
absolute URIs and the graph ends up with non-URI system identifiers
(e.g. the W3C validator produces 'gen:#id') which isn't safe from 
collision as are autogenerated identifiers for anonymous nodes. 

So, should xml:base be made manditory for all instances defining
ID or bagID values? Or at the very least, should it be made more
clear in the specs that the use of ID and bagID values without
an xml:base URI is unsafe in a context of multi-source syndication?

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
 
Received on Wednesday, 19 September 2001 09:35:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:51:51 GMT