RE: QName URI Scheme Re-Visited, Revised, and Revealing from Patrick.Stickler@nokia.com on 2001-08-24 (www-rdf-interest@w3.org from August 2001)

From: <Patrick.Stickler@nokia.com>
Date: Fri, 24 Aug 2001 11:02:24 +0300
To: sean@mysterylights.com
Cc: www-rdf-interest@w3.org, www-rdf-comments@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114BFAD@trebe003.NOE.Nokia.com>
> -----Original Message-----
> From: ext Sean B. Palmer [mailto:sean@mysterylights.com]
> Sent: 23 August, 2001 17:31
> To: Stickler Patrick (NRC/Tampere)
> Cc: www-rdf-interest@w3.org; www-rdf-comments@w3.org
> Subject: Re: QName URI Scheme Re-Visited, Revised, and Revealing
> 
> 
> > > Of course it doesn't say that two names are equivalent; it
> > > simply uses the QNames to form URI references.
> >
> > But if the two QNames are mapped to the same URI, then
> > RDF *is* saying that they are equivalent -- or rather than
> > even if they are lexically distinct, they are not allowed or able
> > to bear any semantic distinction.
> 
> It doesn't use QNames in the model! The model doesn't say 
> anything about
> the QNames that are used in the syntax... the two are 
> entirely separate. 

Of course they are. And this isn't about the model. It's about
the *standard* representation of that model in XML serialization.

And I'm not saying that RDF should use QNames in the model. It
should use URIs. But since QNames are an unavoidable mechanism
of the XML Serialization, the employment of a fully regular mapping 
between QNames in serialization and QName URIs in the model/graph
would alleviate certain problems (with no change in either the
serialization syntax nor the formal model). 

> If
> anyone infers that two lexically distinct QNames are the same 
> just because
> using the RDF conceatenation mechanism they come out to be 
> the same URI,
> then they are totally wrong. [Of course, it may well be that 
> the QNames are
> semantically equivalent in some context, but that cannot and 
> should not be
> inferred from RDF's use of QNames to form URI references.]

Here you're arguing my point precisely. If one source says

   <rdf:RDF ... xmlns:foo="urn:x:abc">
      <rdf:Description about="http://xyz.com/my_resource">
         <foo:def >123</foo:def>
      </rdf:Description>
   </rdf:RDF>

and a totally other source on the other side of the planet says

   <rdf:RDF ... xmlns:bar="urn:x:abcd">
      <rdf:Description 
           about="http://xyz.com/my_resource">
           bar:ef=booga"/>
   </rdf:RDF>

And those two sources are syndicated at run-time, where both creators
of the knowledge were unaware of the other's use of QNames, and therefore
could not forsee any potential problem of collision, we end up with the 
following ambiguous RDF triples:

   [http://xyz.com/my_resource, urn:x:abcdef, "123"]
   [http://xyz.com/my_resource, urn:x:abcdef, "booga"]

And then a SW application that is expecting integers for the distinct
and well defined (insofar as the creator believes) property (urn:x:abc)def 
asks for those values via the ambiguous URI 'urn:x:abcdef' and in addition
to an integer gets a string! Oops.

This *is* a serious problem, even if it is one that at the moment does not
appear to be so serious, because folks are not at the moment encountering
it in their present systems.

RDF is supposed to provide the foundation for encoding and interchanging
knowledge on the global SW -- so one would expect that issues such as
data integrity would be of chief importance. And even though chances for
collision might be considered to be small, the fact that such a possiblility
is *known* to exist should cause great concern for the RDF and SW
communities.

If it was known, e.g. that Oracle or MySQL sometimes dropped a bit on very
large
integers -- but "that's no big deal, since data seldom contains such large
integers" do you think that *anyone* would use them for serious
applications?
And if it were an international standard specifying that "it's OK to
sometimes
drop bits on very large integers" do you think anyone would take that
standard
seriously for global deployment as the primary backbone of a global
database?
I don't think so...

I will confess that I have been playing "devils advocate" about this
issue in order to view it from all sides -- and I will agree that in
practice,
it is unlikely that such collisions would occur in most contexts. However,
in the interest of achieving as solid a foundation for the SW as possible,
the problem should not IMO simply be ignored or dismissed, just because
it isn't troubling anyone at the moment. Such would be a short-sighted and 
dangerous attitude with respect to an international standard -- and
fortunately
one that clearly is not maintained by the membership of the RDF Core working

group.

The one point of "saving grace" regarding such potential collisions is that
they would occur at the boundary of namespace and name for namespaces which
(at least according to every URI scheme I've seen) fall within the same
scope
of authority -- and to that end, that authority has the opportunity (however
burdensome or impractical) to ensure the integrity of all QNames created
within
the scope of that authority. I.e. "urn:x:abc" and "urn:x:abcd" both belong
to the authority which "owns" the urn URI prefix "urn:x:". Likewise, in
similar
examples, "http://xyz.com/abc" and "http://xyz.com/abce" both belong to the
authority having the domain "xyz.com", etc. Still, this is a far weaker
assurance of data integrity than that which would be provided by a
completely
collision-save mapping function -- and at the very least, any need for, or 
obligation of, such an authority to ensure against QName collisions should
be highlighted in the official RDF documentation.

(and of course, this presumes that is it is considered improper or even
illegal
to define and use namespaces belonging to third party authorities without
their 
permission and approval)

> I can imagine Bart writing on the blackboard, "I will not confuse RDF
> Syntax with the RDF Model". 

And I can hear Bart saying "Hey man, eat my shorts" ;-)

But seriously...

> RDF says nothing about the QNames 
> that it uses
> in the syntax, full stop. 

The model says nothing about it, but "RDF" (in its entirety) does. It
says that QNames become URIs by direct concatenation of namespace and
name.

> In the syntax, we can do anything 
> we want with
> the QNames, as long as the model is consistent. 

Agreed, but you are presuming a level of control and omniscience over the
selection of namespaces and names that cannot exist on a truly global and
semi-chaotic SW.

My example above should *not* be able to happen. Period. It should not
matter
*what* URI is used for a namespace, so long as the namespace URI itself 
maintains global uniqueness.

Collisions should not happen. Ever. Just because the current RDF spec
allows them does not mean they are acceptable. 

> The syntax 
> forms the data,
> but the model *is* the data, 

But syntax is needed to get data into the model, and therefore it is
inseperable from the model on the practical level. 

This will remain to be a problem so long as the syntactic representation
of resources differs from the model representation.

The problem goes away as soon as you e.g. disallow QNames for identifying
resources in XML serialization. But doing so is a greater change to the
standard as that invalidates all of the existing instances already defined.
By simply adopting a better mapping function, the data remains valid, the
model remains valid and only the parsers have to be tweaked.

> and semantic constructs such as 
> the notion of
> equivalence can only be derived from there, not from the 
> method of going
> from the syntax to the model.

But if two disparate content creators, unknowingly define what they think is
disjunct knowledge, and that collides to create ambiguity, then the path
from syntax to model is broken and has to be fixed.

Syntax is the doorway into the model. It is not "irrelevant". Just because
a problem doesn't exist in the model doesn't mean that it can't impact
the knowledge represented by that model.
 
> ... I know that you're 
> asking for an
> automated conversion of QNames to URIs in the RDF syntax, ...
> If you do that, then you introduce
> horrendous backwards incompatabilites, because we're already using the
> QNames to form URIs.

I stated that myself from the start. 

> But nothing stops you from creating the QN URIs from the 
> QNames. 

True, but that does not achieve global portability and consistency 
of resource identity across the SW.

> Indeed,
> you can do that automatically using XSLT: converting an XML 
> document into a
> list of QN URIs for all QNames in the document.

I can do pretty much anything in my own kitchen, but that doesn't
mean that from the same ingredients I will get the same dish in any 
arbitrary restraunt on the planet. A clear and precise recipe is
needed.

The key issue is what is standardized, not what potentially could
be done in any given localized context.

> > The mapping from QName to qn URI has to be the
> > "official" mapping.
> 
> RDF has already chosen to use QNames to create URI 
> references, 

Which doesn't preserve data integrity.

> at
> the cost that it is slightly more expensive for people to 
> refer to QNames,
> since they now have to convert them into QN URIs first. 

?? Folks aren't using qn URIs yet. And the idea was for the RDF
parser to do the conversion -- and to allow folks to just use
prefixed QNames in their queries, instances, schemas and leave
the qn URI representation to the triples space (graph) itself,
allowing systems to benefit from the explicit and regular mapping
between QName and qn URI. 

People like to interact with RDF knowledge using QNames rather
than URIs. Look at the number of RDF tools that allow you to
define namespace prefixes for use in queries and interfaces. It's
a pain in the rear end to type long complex URIs rather than
nsprefixed names -- especially since alot (most?) folks think
in terms of prefixed names rather than URIs (i.e. they're thinking
dc:title and not "http://purl.org/dc/elements/1.1/title", even if
the latter is the official identity of the resource).

I was simply saying, let's make this use of QNames as universal
identifiers more "native" to RDF and more consistent and explicit
by having a "proper" URI for QNames rather than just straight
concatenation. That's all.

But as Dan has clarified, this may be difficult to actually achieve
in practice because there are already numerous "interpretations"
of QName identity which are not fully compatible, and thus any
qn URI scheme may either only be workable for RDF or may become
grossly complex to meet all interpretations and needs of all the
current standards using namespaces.

> But 
> data on the
> Semantic Web is represented by URIs, 

Agreed. And a qn URI is a URI.

> and we can afford to make it more
> difficult to represent QNames. And blimey, it's not that much more
> difficult; you've only got to learn the new URI scheme that 
> you came up
> with.

But that presumes that folks will either (a) never use actual
QNames in their RDF instances, using only explicit URIs, which
actually is not reasonable to presume, or (b) provides localized
custom conversion of RDF instances into triples where every QName
is mapped to a qn URI, i.e. write their own RDF parser.

If QNames are to be mapped to qn URIs, then it has to be done by 
every RDF parser in a fashion mandated by the standard (though it 
is highly unlikely that that would ever happen) .
 
> The wonderful upshot of all this is that it's basically 
> tough: RDF is not
> going to change, and that's it. 

It certainly looks that way. Which (to play devils advocate some
more) may preclude it from serious consideration as the ideal or
primary vechicle for the global backbone of the SW...

> You may as well just learn 
> how to cope with
> representing QNames in the RDF model by identifying them with 
> URIs. RDF is
> not going to use QNames in the syntax to map to QNames in the model,
> doesn't have to, and shouldn't have to.

Firstly, I'm not asking that RDF have QNames explicitly in the model.

If you got that impression, then I must have not expressed myself
sufficiently well. The adoption of an explicit, standardized mapping
(during parsing only) from XML QNames to qn URIs requires *no* change
whatsoever to either the syntax or to the model. The syntax continues
to use QNames as-is. The model continues to use URIs as-is. The proposal 
was simply to have a more explicit, bidirectional, and collision-safe 
URI derived from the QName than is now derived via direct concatenation.

No more. No less.

I never honestly expected that such an alternate mapping function would
be adopted by RDF. As I mentioned in my original post regarding this qn 
URI scheme proposal, it was intended as food for discussion providing an 
alternate perspective on what remains to be a problem, based on an alternate
mapping function that does not share the shortcomings of the current
function.

I.e. I attempted to show what RDF might be like had such a qn URI based
mapping approache been adopted from the start.

> > > It doesn't keep the QName information because it is
> > > irrelevant;
> >
> > If lexically distinct QNames are capable of bearing distinct
> > semantics, then their distinction cannot be considered
> > irrelevant.
> 
> It's irrelevant to the RDF processor. 

Per my above example of lexically distinct QNames from totally
disparate sources, I hardly think that such a distinction could
be considered irrelevant.

> The RDF processor 
> merely sees QNames
> as a method of creating URI references. That's all it can grok: URI
> references. If you want to use QNames, you have to identify 
> them as URI
> references.

That's precisely what I was doing. I'm confused why that is not
crystal clear.

The qn URIs are totally opaque in the RDF model. They only have
structure relevant to a parser mapping serializations to graphs
or a serializer mapping a graph to a serialization. Just as
an HTTP URL is opaque in the model, but has structure relevant
to an HTTP server when dereferencing it.
 
> [...]
> > I think you're missing the point entirely here. It's about 
> preservation
> > of identity as defined by QNames within the RDF URI space.
> 
> Once again, if you use your URI scheme, the identity of the QNames is
> preserved.

But once again, not in a standardized, global fashion, unless it
is done the same way by every RDF parser for every serialized
instance.

> > [...] We're just getting to a stage now where certain cracks
> > are showing, but only if you're standing on the right side of
> > the building do you see them.
> 
> I think that the fact that you have struggled to come up with a few
> contrived examples of these cracks is telling enough that 
> these cracks you
> see are no more than lines that you've painted on the wall. 

They are real. They may not mean an impending total collapse of
the structure, but they are real.

> If you can
> provide a decent example of where some real-world application breaks
> because of the way RDF currently is, then I shall accept that 
> something
> needs to be changed.

Fair enough, but I think that might be fairly construed to be a
narrow and somewhat short sighted view of the matter.
 
> Where are you getting these analogies from? :-) They're cool, 
> but just a
> product of some sloppy arguing on my behalf. Let's just stick to the
> technical details.

I keep trying to do that. Sorry if I waxed philosophical a few times...
 
> [...]
> > > Once again, it does not declare the QNames to be identical.
> > > It simply uses the QNames to form URI references.
> >
> > Once again, if it maps lexically distinct QNames to the same URI,
> > then RDF declares them identical.
> 
> But the QNames aren't in the model. 

I never (intentionally) said they were.

> If you mean "identical to an RDF parser that is 
> handling the
> syntax", then yes, it sees them as identical

Exactly. 

> , but that's 
> obvious. 

Obvious perhaps to folks building parsers but not necessarily to
people (now or in the future) creating content based on the syntax.

> The model is the 
> important bit of
> RDF; the syntax is just a means to an end, to get triples.

I agree with that wholeheartedly, but as syntax is the doorway
to the model, it cannot be tossed aside as irrelevant.
 
> [...]
> > > But all data on the Semantic Web are resources, which may be
> > > identified by URI references.
> >
> > Right, data that is created as, stored as, and exchanged as 
> serialized
> > XML instances.
> 
> XML RDF instances, right.

Yes, but XML instances nonetheless, with QNames. Not NTriples instances.
Not N3 encoded "instances". Not any other alternate non-standard 
serialization, but XML instances as defined by the RDF spec.

> [...]
> > If there is a lexical (and hence potentially semantic) distinction
> > between two QNames in the serialization and my local RDF
> > engine knows how to preserve that distinction
> 
> If your "local RDF engine" preserves QNames that are in the syntax on
> forming the model, then it is a non-conformant RDF parser, 
> i.e. hopelessly
> broken.

It's not preserving QNames. It's preserving distinctions in identity
inherent in different QNames, by arriving at different URIs in
the transition from serialization to graph. 

But, I agree, any parser (or rather parsing process from instance to
triples) that behaves in a fashion contrary to that specified and 
allowed by the spec is broken. Of course. 

Yet in the same way, applying e.g. XSLT scripts to valid RDF XML instances
to change resource identity prior to production of triples (as suggested
by you and Dan), such that those changes are not mandated and defined by 
the spec, is also "broken" in that it results in a set of triples from 
valid serialized RDF instances which will be different from any other 
RDF system that strictly follows the spec. No?

If the creator of knowledge encoded as an RDF XML instance says that
an X is an X, then changing that to a Y during de-serialization to
triples is contrary to the standard. If you have a specialized 
application that needs to make such changes to identity, fine, but
it must be accepted that such changes impedes on the sanctity of
the original knowledge and results in a representation of that
original knowledge that is not globally consistent.

My proposal was to achieve global consistency of representation
across the syntax:model boundary.

> > [...] then we have failed to maintain the integrity of resource
> > identity (and hence the integrity of knowledge) on the SW.
> 
> Yeah, if your parser is broken and then you get people to use 
> it, of course
> you're going to mess up the Semantic Web.

That's been my whole point. Whatever method is used, it has to be
the *standardized* way of doing things. Custom, localized solutions
are unnacceptable if we wish to have global consistency in our
knowledge representation.

> > [...] Identity cannot vary from agent to agent according to
> > localized interpetations of QName to URI mapping!
> 
> The RDF specification is *very* clear about that mapping; and there is
> absolutely no room for "localized interpretations". 

I fully agree. But that's just what XSLT preproccessing results in, 
namely localized interpretations of QNames.

> If you 
> preserve the
> identity of syntax QNames in the model, then your parser is 
> non-conformant.

Non-conformant to the present spec, agreed. My proposal was to change the
spec
to preserve that distinct identity.

> You either comply or you don't, there is no "two ways about it".

We're not at all in disagreement about this.
 
> > [...] If identity is not consistent, the SW won't work.
> >
> > No?
> 
> Of course not. If you want to sit there and write a parser 
> which ignores
> the RDF specification, 

I think you've misunderstood what I was suggesting.

> But the 
> Semantic Web uses
> URI references, not QNames: 

Again, I wasn't proposing using "QNames", but URIs that are
strongly equivalent to QName identity, providing reliable
mapping between such URIs and QNames.

And the SW uses opaque universal identifiers, not URI refs.

It *adopts* URI refs as the realization of it's identifiers
to take advantage of (a) that URI refs are intended to be globally
unique, and hence meet the uniqueness requirement for universal
identifiers, and (b) that to some applications, such URI refs
may provide additional utility which is secondary to their primary
role as universal identifiers (namely, you might be able to
dereference them).

But the SW per se does not use URI references. All resource identity
within the SW is opaque. The SW and RDF could have used some completely
different mechanism for defining its resource identifiers (though I
agree that using URI refs adds alot of utility and is a good choice).

> the semantics come from 
> interpreting the mass
> of data made by triples of those URI references, as handled 
> by parsers and
> inference engines etc. That's the beauty of it all.

No disagreement there.

> > [...] RDF has no reliable means of re-serialization
> > that guaruntees the same QNames it got on input.
> 
> Since it doesn't use QNames in the model, what does it 
> matter? 

As you've misunderstood me to be proposing QNames as first class
objects in the RDF model (which I didn't) I'll just skip this
part of the discussion...

[snip]

> But the Semantic Web uses DLGs... so why are you talking 
> about XML QNames?

Because RDF also employes XML serialization and aspects of
that serialization can impact the body of knowledge derived
from the serialization into the graph.
 
> [...]
> > Therefore it is possible to assign non-ambiguous semantics
> > in an XML serialization which becomes ambiguous in an RDF
> > graph, and therefore there is potential loss of information and
> > unintended introduction of ambiguity into the SW.
> 
> Well, the trick is to convert the identifiers in the XML 
> serialization into
> a form that is fit for the Semantic Web. In other words, you have to
> convert the QNames into URIs that represent those QNames, and 
> then use the
> URIs instead. RDF doesn't allow it any other way.

Argh. But if you have to do that, then you have to do that with
every parser in every case!

The point of the RDF syntax was to define a form that is "fit for the
Semantic Web". Are you now then agreeing that that current RDF syntax
does not provide a form for identities that is fit for the semantic web?

You just above argued that localized "tricks" will create a mess
on the SW (which I fully agree) so how can localized tricks now be
the solution to this loss of distinction?!
 
> I agree that to a small extent this is a conceptual problem, 
> but there are
> no particular use cases that sping to mind that would prove 
> that this is a
> sufficiently difficult a mechanism to use in order that we 
> have to make
> backwards incompatable changes to RDF. The fact is that it's 
> not impossible
> to identify QNames in RDF, just slightly twisted, in that you 
> have an extra
> processing step to do.

But the extra processing step is non-standard and therefore
unnacceptable if it is necessary to preserve such distinctions
of identity globally in any SW application utilizing that
knowledge.
 
> > This issue has been discussed elsewhere on this list at great
> > length. I won't re-address it here.
> 
> Please, at least provide references.

Cf.
http://lists.w3.org/Archives/Public/www-rdf-comments/2001JulSep/0124.html
and the threads referenced therein.
 
> > [...] The XML spec defines a way to achieve a
> > consistent representation of structured data. To
> > randomize it violates that fundamental goal.
> 
> The RDF specification is very, very, clear about the QName to 
> URI reference
> mapping, and as such cannot be considered "random" at all.

I didn't say RDF was random. Please re-read what I said. I gave
a (bogus) example of how an *XML* parser (not RDF parser) might
misbehave.

> > [...] But if it *is* an error, then it needs to be addressed
> > and (hopefully) fixed.
> 
> It's not an error.

That's certainly one view.

But whether it is or not, it's a separate question whether it is
something that will change.

I don't (and never did) expect that it would -- but hopefully it
will be sufficiently addressed in revisions of the spec.

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Friday, 24 August 2001 04:02:40 UTC