RE: QName URI Scheme Re-Visited, Revised, and Revealing from Patrick.Stickler@nokia.com on 2001-08-23 (www-rdf-comments@w3.org from July to September 2001)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 23 Aug 2001 15:21:17 +0300
To: sean@mysterylights.com
Cc: www-rdf-interest@w3.org, www-rdf-comments@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114BFAB@trebe003.NOE.Nokia.com>
> > Uhhh... so even if the NS spec says X and Y are different,
> > RDF can do whatever it likes with them, including saying
> > X = Y [...]
> 
> Of course it doesn't say that two names are equivalent; it 
> simply uses the
> QNames to form URI references.

But if the two QNames are mapped to the same URI, then RDF *is*
saying that they are equivalent -- or rather than even if they
are lexically distinct, they are not allowed or able to bear any
semantic distinction.
 
> [...]
> > If potentially four lexically distinct QNames (i.e. two QNames
> > which collide on direct concatenation used both for elements
> > and global attributes) are merged by RDF into one single URI
> > derived by the present RDF function, then how can you possibly
> > make statements about them to differentiate them [...]
> 
> Er, perhaps using the URI scheme that you just invented? 

No. If we use the qn URI scheme, then we don't *need* to make such
statements, since that information is explicit in the qn URI.

If we use the current RDF mapping function, then all four lexically
distinct QNames are mapped to the same URI and thus we are unnable
to make any statements to differentiate them after we're in RDF-land 
since they no longer have distinct identity in the RDF space to serve 
as subjects in those differentiating statements.

The mapping from QName to qn URI has to be the "official" mapping.

> It doesn't 
> keep the QName
> information because it is irrelevant; 

If lexically distinct QNames are capable of bearing distinct
semantics, then their distinction cannot be considered
irrelevant.

> we are not using QNames on the
> Semantic Web, we're using URI references. And if you want to identify
> QNames, partitioned and all, well we have your URI scheme for that.

I think you're missing the point entirely here. It's about preservation
of identity as defined by QNames within the RDF URI space. The current
system potentially introduces ambiguity which does not exist in the
serialization, and cannot exist because the NS spec clearly defines
their distinction.

> The weird thing is that for years no one had had a problem 
> with this. 

Perhaps because for years, folks have been building small, closed systems 
using HTTP URI fragments with HTML fragment syntax and the clever hack
of adding the '#' at the end of the namespace URI -- or just because they 
have been lucky.

> That
> doesn't necessarily mean that people have understood it, I 
> agree,

Exactly. I don't think a majority of folks have understood the problem
because it has both been hidden by usage of HTTP URLs and HTML fragment
syntax and because the SW hasn't yet scaled up to a truly global
context. We're just getting to a stage now where certain cracks are showing,
but only if you're standing on the right side of the building do you
see them.

>  but no
> one has had a problem implementing the QName concatenation 
> thing that RDF
> does, 

Just because no one has fallen into the hole yet, doesn't mean it
isn't there or that it shouldn't be filled.

The hole is at one end of the playground, and most folks have been
playing at the other end, and those of us who have been scoping out
the whole playground have seen the hole. Some of us have fallen into
it ;-)

> and no one has moaned that RDF violates any Web axioms, 
> and there are
> a lot of people at the W3C who would do just that if they thought the
> slightest little rule was being broken. 

As much respect as I have for most of the folks I have met
who work for or with the W3C, no person or group is omniscient
and all standards and technologies are imperfect. The scope and
definition of the SW is evolving and being refined, as is all 
of the Web, and we are collectively learning new things every
day. Each new attempt to push the envelope a little further
reveals shorcomings, holes, and implicit but invalid assumptions
in the standards that must be addressed to move onwards.

It may simply be that the folks at the W3C simply haven't looked
at the problem from the perspective needed to see it.

But to ignore warnings from folks who have seen the problem is
like a wagon train riding in the dark of night ignoring warnings 
from their scouts that they're headed for a cliff. The drivers of 
the wagon train may not see the cliff, but that doesn't mean they 
won't fall off it when they get to it. Eh?

> It's not as if the 
> people who came
> up with the concatenation mechanism weren't aware of exactly 
> what was going
> on.

Don't be so sure. The folks that were there will have to comment
on that themselves, though. Maybe they were. Maybe they weren't.

Standards are born of perspective, goals, needs,
time, and energy. That's why we have to update and fix them
from time to time, as needs change, perspectives broaden, and
the passage of time brings deeper understanding.

> > But then we get that nasty problem of element and global
> > attribute QNames having identitical semantics according
> > to RDF's condensed serialization syntax [...]
> 
> Once again, it does not declare the QNames to be identical. 
> It simply uses
> the QNames to form URI references.

Once again, if it maps lexically distinct QNames to the same URI,
then RDF declares them identical.
 
> > [...] If I can't rely on *every* RDF engine used by every
> > SW agent to interpret my data exacly as I have defined it,
> > then the SW has no data integrity. Again, its about global
> > consistency of data.
> 
> But all data on the Semantic Web are resources, which may be 
> identified by
> URI references. 

Right, data that is created as, stored as, and exchanged as serialized
XML instances.

The XML instance preceeds the knowledge base of triples. That's the
way it works in the real world.

If there is a lexical (and hence potentially semantic) distinction
between two QNames in the serialization and my local RDF engine
knows how to preserve that distinction but some remote RDF engine
does not, then we have failed to maintain the integrity of resource
identity (and hence the integrity of knowledge) on the SW.

There has to be consistency of identity and knowledge representation
across the entire SW as dictated by the standards. Identity cannot
vary from agent to agent according to localized interpetations of
QName to URI mapping!

It's not about tools. It's not about systems. It's not about applications.
It's about the standardized representation and interchange of knowledge
on an global basis. If identity is not consistent, the SW won't work.

No?

> Perhaps you are getting confused that XML has 
> no inherent
> semantics, and that therefore is not the primary candidate 
> for the Semantic
> Web? ON the Semantic Web, all knowledge is grounded in URI 
> space, not XML
> space. 

No, in fact, having a background in computational linguistics, I
have a very solid understanding of the relationship between syntax
and semantics.

I might suggest that you don't, if you think that a URI identifying
a resource bears any more semantics than a QName (please don't throw 
any heavy objects at me ;-)

> The fact that it gets *serialized* into XML for transfer is
> incedental, 

It should be incendental, but it's not, because it's not fully 
regular, bi-directional, consistent, etc. RDF has no reliable
means of re-serialization that guaruntees the same QNames it
got on input. It can serialize from a URI to "some" QName
which when de-serialized again back into triples gets the same
URI (i.e. URI->QName->URI is reliable), but there is no guaruntee 
that in a QName->URI->QName round trip transformation we will get 
the same QName out that we put in! This is because the RDF QName
to URI mapping function loses the explicit partition between
namespace and name, which is a fundamental, defining characteristic
of the QName itself.

> and you should not get hung up on the fact that the
> serialization involves using QNames to form the URIs... and 
> yet you seem to
> get hung up time after time.

QNames and URIs are just lexical forms. They may have structure
at some lower level that might be important for other operations,
but insofar as either an XML application or RDF application are 
concerned, they are just unique identifiers of structural components.

Such identifiers serve as constructs for representing semantics.
The semantics is clearly not inherent in the identifiers themselves,
no more so than I am contained within my social security number.

In the case of XML, QNames identify nodes in a tree. In the case
of RDF, URIs identify nodes in a graph. There's no fundamental
difference between them whatsoever, insofar as their role as
identifiers are concerned.

I am "hung up" on the issue that RDF employs both forms of identifiers,
and defines that there is an equivalence relation between them
(hence the existence of QName to URI mapping function) -- but that
the equivalence relation is many-to-one rather than one-to-one;
and that some of that many-to-one mapping is accidental and
potentially hidden to content producers who's knowledge participates
in multisource syndication in some remote SW agent, and therefore
cannot even prepare for and avoid the potential collision.
 
Therefore it is possible to assign non-ambiguous semantics in
an XML serialization which becomes ambiguous in an RDF graph,
and therefore there is potential loss of information and unintended
introduction of ambiguity into the SW.

I see that as something worth getting "hung up" over (though I'm 
just as likely to simply get "hung" over it ;-)
 
> > [...] However, since people are already thinking in terms
> > of QNames when defining their Web based ontologies,
> 
> What??? That's blatantly false! 

I was mostly thinking of the XML community at large, but my response
to your comment is the same...

The existence of countless scripts and hacks that scan for prefix:name
patterns rather than (namespace}name pairs -- and the arguments from
time to time one sees about "is this prefix reserved", etc. clearly
shows that (normal) folks working with XML instances think in terms of
qualified names -- whether they do so correctly and whether the Gods
of the Web do so or not is beside the point -- and I would argue that 
alot of folks even find the need to declare the namespace prefixes an 
inconvenience.

Real users are *not* thinking (http://purl.org/dc/elements/1.1/)title 
but rather 'dc:title'!

Just *try* to pass around DC RDF instances that use e.g. 'foo:title'
instead of 'dc:title' and listen to the people *scream*!

If folks didn't think in terms of not only QNames but minimized prefixed
QNames when designing SW ontologies, then why would e.g. both the DC
and PRISM specs (and I'm sure many others) recommend the use of specific 
namespace prefixes in the interest of consistency of readability!?

And what do you think folks would say about the following perfectly
legal use of ns prefixes:

   <dc:RDF xmlns:dc="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns:rdf="http://www.w3.org/2000/01/rdf-schema#"
           xmlns:rdfs="http://purl.org/dc/elements/1.1/">
      <dc:Description dc:about="urn:foo:bar">
         <rdfs:title> Ain't this a hoot! </rdfs:title>
      </dc:Description>
      <dc:Description dc:about="http://purl.org/dc/elements/1.1/title">
         <rdf:subPropertyOf dc:resource="urn:foo:bar:bas"/>
      </dc:Description>
   </dc:RDF>

The reason why the above is so distasteful is because we *do* think
in terms of minimized, prefix based QNames in our ontologies, regardless
of their form.

True, use of a specific prefix is *not* required, but it is *pervasive*
common practice to recommend that a consistent prefix be used, because
folks do not want to be thinking (namespace)name if they can think
prefix:name. In either case, though, they're still thinking in terms
of QNames, and not URIs. 

When folks encode RDF statements in an XML instance and specify a URI, 
it's usually the URL of some web page, not a URN of some abstract concept. 
Subjects are HTTP URLs, predicates are QNames, and values are either HTTP 
URLs or literals. That's probably true of most of the current RDF presently
defined on the planet.

Whether you like it or not, QNames are *the* primary naming construct
of the XML world, not URIs, and because RDF adopts XML serialization
with namespaces, that means that QNames are also the primary naming
construct of RDF properties or any other resource which may have
QName identity in an RDF serialization.

I see no reason why QNames (represented as URIs) should not become
a primary naming construct for abstract resources such as properties
in the RDF world. It would achieve perfect consistency with all
serializations and follow common usage in the XML community at large.

QNames are great universal identifiers. We need good URN schemes. Let's
use QNames as URIs.

> > Folks using XML *think* in terms of QNames insofar
> > as their data models, vocabularies, and ontologies are
> > concerned.
> 
> I'll bet that not many XML developers really understand much about
> namespace partitions and so on.

Probably not, since they are non-normative, and many folks consider
non-normative to equal not-required. But then, it's really XML parser
developers who have to worry about such nuts-n-bolts issues, not
XML developers in general, unless it impacts their own work. 
Apparently, the XML parser developers *do* understand QName partitions,
an so things relying on the parsers don't blow up. 

> > [...] The fact that there is *not* an official, standard
> > URI representation for QNames is what is surprising...
> 
> Well, no one has really had a use for it, so not really.

You may be right on that point. RDF may very well be the first
standard to need an explicit QName to URI mapping -- in which
case, it's even more important that RDF "get it right" as it
sets a precidence for future standards and methodologies.
 
> [...]
> > > Using the concatenation mechanism is an excellent and
> > > quick way to form those URIs out of QNames.
> >
> > Quick? Maybe. Excellent? No!
> >
> > It was a very clever hack that works with HTTP URLs
> > using HTML fragment syntax, [...]
> 
> I think you'll find that a) FragID syntax is independant of 
> URI scheme and b) the "hack" works with a wide range, indeed a gross
> majority of URI schemes. ...
> it's not much of a problem!

I disagree. In various ways for various reasons.

This issue has been discussed elsewhere on this list at great length. 
I won't re-address it here.

> > It also does not maintain lexical distinctions defined by the
> > NS spec, and for that reason alone, its validity is suspect.
> 
> The NS specifications says nothing about what processors 
> should do with
> QNames, it simply defines what QNames are.

But if processors can disregard the distinctness of QNames, then
there is no *purpose* for the NS spec, because *all* it does is
define the distinctness of QNames!

It's like saying the XML Spec doesn't say you can't deliberately
randomize every instance tree when parsing, so it's OK to do that
and no application using your freaked out parser can complain
cause the spec doesn't explicitly rule such behavior out (that 
I know of, but if so, then hats off to the XML spec authors ;-)

There is a certain degree of common sense that must be applied
when interpreting standards, and that includes not violating the
fundamental goals of the standard. The XML spec defines a way to
achieve a consistent representation of structured data. To randomize it
violates that fundamental goal. The NS spec defines the mechanisms
for QName distinction -- that distinction being the very goal of
the NS spec -- and to discard such distinctions is a violation of
the NS spec.

If the creators of the RDF spec made a boo boo because they failed
to take in all perspectives and considerations that now face the
SW, fair enough, we're all human. But if it *is* an error, then it
needs to be addressed and (hopefully) fixed.

> > If it's not standardized and mandated for all RDF applications, it's
> > not a solution to the present problem(s).
> 
> It's just a new URI scheme, and URIs are opaque. All RDF applications
> already handle it.

It's not enough to just handle it. That's been the whole point of
this entire mapping issue.

If the requirement that all RDF parsers map QNames to qn URIs in triples
is *not* part of the standard, then the consistency of QName lexical
identity that the qn URI offers cannot be achieved globally throughout
the SW if it is not.

> > Parentheses and escaping should do just fine [...]
> 
> Cool.

At least that part is clear ;-)

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Thursday, 23 August 2001 08:31:17 UTC