RE: Summary of the QName to URI Mapping Problem from Patrick.Stickler@nokia.com on 2001-08-30 (www-rdf-comments@w3.org from July to September 2001)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 30 Aug 2001 11:47:38 +0300
To: www-rdf-comments@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114BFB9@trebe003.NOE.Nokia.com>
> The problem is statements such as "the semantics at the lower level is
> machinery at the higher level."  This is a close relative of
> statements of this sort: "RDF ontologies provide a semantics for (pick
> one) the Web, XML, ...."  The problem with both of these is that they
> confuse notations and algorithms with semantics.
> ...
> and 'mother' might mean "favorite beverage brand," ...

I think I follow what you are saying here. Though for me, I like to
differentiate clearly between the (formal) semantics of RDF and the 
(informal?) semantics attached to a given ontology. The RDF semantics
is concerned with things such as statements, predicates, subjects,
objects, axioms, etc. and how other ontologies are defined in terms of 
that semantics; i.e. RDF is a semantics for defining labeled graphs,
right? (just as XML provides a semantics for defining labeled trees).

My understanding has always been that non-RDF semantics which is 
associated with (represented by) the terms in a given ontology (i.e. the 
"real world" concepts that those terms represent) is irrelevant to the 
core machinery (i.e. semantics) of RDF proper. But the issue of maintaining 
integrity (uniqueness) of terms is crucial to the proper operation of 
that machinery.

So it doesn't matter whether a term 'mother' actually represents the
concept "MOTHER" or "FAVORITE BEVERAGE BRAND", per se, but ideally it
should not accidentally and untentionally represent both. Note the
words 'accidentally' and 'unintentionally', i.e. it's not that someone
is incorrectly asserting they are the same, but that noone asserts
that they are the same but they end up being treated the same because
they end up with the same identity in RDF space. That is the
whole gist of my issue with the QName to URI syntax mapping.

It has become clear from these discussions, though, that although the
risk of such collisions is still real with the present mapping function
employed by RDF, I see now that the chances of it occurring are much
smaller than originally thought, and also any collisions would happen
(presumably) within the URI space of the same authority, and hence there
is a reasonable way to address the risk; namely, document in the standard
that there *is* such a (remote) risk and that specific practices should
be followed to avoid it.

> Note, however, that part of the ambiguity is syntactic, if I
> understand correctly, and not semantic at all.  That is, through the
> XML lens one sees *different symbols in different arrangements* than
> one sees through the RDF lens.  Isn't this what the QName controversy
> is about?  A given expression of the form QName:id is broken down into
> components in different ways in XML and RDF.

Yes. And a side issue (that I think still needs at least some clarification
in the official documentation somewhere) is to what degree the RDF treatment
of QNames differs from other key interpretations (e.g. by XML Schema, XPath,
DOM, etc.) so that folks used to deploying systems utilizing those other
XML technologies are not accidentally carrying over presumptions about
QName interpretation into RDF (as I did ;-)

It is an unfortunate case that many folks who approach RDF from the XML
side, as opposed to the conceptual model side, struggle with issues
relating to the representation of RDF graphs and resource identities
in the serialization model. Not everyone does, but many do. QName to
URI mapping is one issue. Two sets of terminology is another issue (i.e.
the conceptual model talks of statements, predicates, subjects, objects,
etc. but the serialization talks of descriptions, values, about, and
predicate resources have an alternate representation from other resources,
etc.). Multiple variant representations is another issue. 

I think that alot of folks (many of whom I've heard from first hand) 
have tossed RDF aside because they can't get past the serialization into 
the model, which is a great pity, because the conceptual model is IMO 
fantastic. Jaane Saarela recently shared with me that because of this 
problem, when he teaches RDF, he focuses almost exclusively on the 
conceptual model and tells folks to not worry about the serialization;
which is how I wish *I* had first approached RDF, as opposed to trying 
to grok RDF based on the syntax model and serialized examples...  ;-)

Unfortunately, XML folks tend to approach a new XML application by
first looking at the DTD and example instances. As I've said before, the
serialization of RDF is the doorway into the conceptual model, so if that 
doorway isn't inviting, or is too hard to open, not many folks are going 
to come inside, no matter how grand the interior is. No?

> Even so, I think Patrick's point is correct: Nothing prevents a piece
> of software from reading QNames in the XML way at the same time it
> reads them in the RDF way, and the resulting synergy might be quite
> valuable.  

Quite so. I was mostly trying to clarify a common misconception of
some of my earlier posts -- where some folks were misunderstanding me
to be proposing that QName semantics have some formal place within
RDF semantics, which I never suggested nor ever meant to suggest.

I had only made the proposal (and not a serious proposal at that)
that if RDF utilized a special URI scheme rather than direct
concatenation, then the several of the mapping issues would
go away, the mapping would be perfectly bi-directional (for
consistency in re-serialization), and some useful information
would be available for operations above the RDF layer which chose 
lookin into the URI to access the QName structure. That's all.

>    Secondly, I was referring mostly to semantics associated 
> with ontologies
>    and identified by both URIs in the graph and QNames in 
> serializations,
>    and not the semantics of RDF itself -- which I see as yet a third
>    layer/level
>    of semantics that is disjunct from either URI Scheme 
> semantics or specific
>    ontological semantics. 
> 
> You've lost me here.  I suspect the problem lies in the phrase
> "semantics associated with ontologies," which sounds like it's
> infected with the confusion I described above.

Acutely infected ;-)

I'll try again in my admitedly corse layman's terms -- I meant the 
semantics that is associated with (represented by) globally unique 
identifiers, which take one of two forms: (1) QNames in XML (if 
identifying predicate resources), and (2) URIs in RDF Graphs. Such
semantics is separate from (and irrelevant to) the specific semantics
of RDF -- though the terms representing that non-RDF semantics are 
themselves within the scope of RDF semantics as they participate in 
statements defined and manipulated using RDF semantics. Or is that 
even more unclear? ;-)

Presumably the association with a concept (resource) assigned to an 
identifier, whether it be in QName or URI form, remains constant 
irregardless of the form the identifier takes (graph or XML), and those 
alternate forms would also presumably have a 1:1 relation. Thus, 
one would expect that these two sets of forms would have the same number 
of members. But this is not the case (technically), in that there could 
be a many to one mapping of QName to URI and also (depending on the
splitting 
function) a one to many mapping of URI to (autogenerated) QName. Now, 
it has become apparent that this imbalance of mapping is irrelevant, 
as apparently only the set of URIs is considered to be the official, 
sole representation of resources. Fair enough, technically speaking.

However...

Problems (for many real folks building systems using RDF or defining 
knowledge in RDF) arise because they approach RDF from the perspective
of the serialization, and in XML land, QNames typically are the official
representation of resources or other informational components; thus it
is a fair assumption that those would have some inherent value in RDF. 
The fact that they do not is missed by alot of folks (including myself).

Whether such non-1:1 mappings *should* occur in practice is a separate
issue from the fact that they *could* occur in practice, and the fact
that they should not occur in practice is only apparent when approaching
RDF from the conceptual model side -- i.e. starting with URIs -- and
is not apparent when starting from the XML serialization side, starting
with QNames, which is unfortunately where many if not most folks start.

> All these semantic issues are interrelated.  If we spell out formally
> what sorts of things a URI denotes, then a given URI must denote one
> of those things.

What I meant was that the semantics of the URI Scheme is "hidden" in
RDF-land such that the "things a URI denotes" within an RDF graph
is totally disjunct from any interpretation, analysis, parsing, or
dereferencing of that URI according to the URI Scheme. Insofar as
RDF is concerned, it's not a URI, it's just a unique string. Right?

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Thursday, 30 August 2001 04:47:49 UTC