Re: QName URI Scheme Re-Visited, Revised, and Revealing from Sean B. Palmer on 2001-08-23 (www-rdf-comments@w3.org from July to September 2001)

From: Sean B. Palmer <sean@mysterylights.com>
Date: Thu, 23 Aug 2001 15:31:07 +0100
To: <Patrick.Stickler@nokia.com>
Cc: <www-rdf-interest@w3.org>, <www-rdf-comments@w3.org>
Message-ID: <020601c12be0$ba56bd80$e9da93c3@Palmer>
> > Of course it doesn't say that two names are equivalent; it
> > simply uses the QNames to form URI references.
>
> But if the two QNames are mapped to the same URI, then
> RDF *is* saying that they are equivalent -- or rather than
> even if they are lexically distinct, they are not allowed or able
> to bear any semantic distinction.

It doesn't use QNames in the model! The model doesn't say anything about
the QNames that are used in the syntax... the two are entirely separate. If
anyone infers that two lexically distinct QNames are the same just because
using the RDF conceatenation mechanism they come out to be the same URI,
then they are totally wrong. [Of course, it may well be that the QNames are
semantically equivalent in some context, but that cannot and should not be
inferred from RDF's use of QNames to form URI references.]

I can imagine Bart writing on the blackboard, "I will not confuse RDF
Syntax with the RDF Model". RDF says nothing about the QNames that it uses
in the syntax, full stop. In the syntax, we can do anything we want with
the QNames, as long as the model is consistent. The syntax forms the data,
but the model *is* the data, and semantic constructs such as the notion of
equivalence can only be derived from there, not from the method of going
from the syntax to the model.

[...]
> If we use the current RDF mapping function, then all four
> lexically distinct QNames are mapped to the same URI
> and thus we are unnable to make any statements to
> differentiate them after we're in RDF-land

So use the URI scheme instead. You *cannot* use arbitrary XML QNames in RDF
and expect them to be preserved as QNames. I know that you're asking for an
automated conversion of QNames to URIs in the RDF syntax, but you're going
about it all the wrong way (IMHO). If you do that, then you introduce
horrendous backwards incompatabilites, because we're already using the
QNames to form URIs.

But nothing stops you from creating the QN URIs from the QNames. Indeed,
you can do that automatically using XSLT: converting an XML document into a
list of QN URIs for all QNames in the document.

> The mapping from QName to qn URI has to be the
> "official" mapping.

But URIs can identify anything, and QNames are just bits of syntactic junk
in XML. RDF has already chosen to use QNames to create URI references, at
the cost that it is slightly more expensive for people to refer to QNames,
since they now have to convert them into QN URIs first. But data on the
Semantic Web is represented by URIs, and we can afford to make it more
difficult to represent QNames. And blimey, it's not that much more
difficult; you've only got to learn the new URI scheme that you came up
with.

The wonderful upshot of all this is that it's basically tough: RDF is not
going to change, and that's it. You may as well just learn how to cope with
representing QNames in the RDF model by identifying them with URIs. RDF is
not going to use QNames in the syntax to map to QNames in the model,
doesn't have to, and shouldn't have to.

> > It doesn't keep the QName information because it is
> > irrelevant;
>
> If lexically distinct QNames are capable of bearing distinct
> semantics, then their distinction cannot be considered
> irrelevant.

It's irrelevant to the RDF processor. The RDF processor merely sees QNames
as a method of creating URI references. That's all it can grok: URI
references. If you want to use QNames, you have to identify them as URI
references.

[...]
> I think you're missing the point entirely here. It's about preservation
> of identity as defined by QNames within the RDF URI space.

Once again, if you use your URI scheme, the identity of the QNames is
preserved.

> > The weird thing is that for years no one had had a problem
> > with this.
>
> Perhaps because for years, folks have been building small, closed
> systems using HTTP URI fragments with HTML fragment syntax
> and the clever hack of adding the '#' at the end of the namespace
> URI -- or just because they have been lucky.

Well, perhaps that was a weak arguement, but that doesn't stop you from
being wrong :-)

> [...] We're just getting to a stage now where certain cracks
> are showing, but only if you're standing on the right side of
> the building do you see them.

I think that the fact that you have struggled to come up with a few
contrived examples of these cracks is telling enough that these cracks you
see are no more than lines that you've painted on the wall. If you can
provide a decent example of where some real-world application breaks
because of the way RDF currently is, then I shall accept that something
needs to be changed.

[...]
> The hole is at one end of the playground,

I thought they were cracks on the side of the building? I'd get a new
contractor, if I were you. Quick, before I call Health & Safety! :-)

[...]
> But to ignore warnings from folks who have seen the
> problem is like a wagon train riding in the dark of night
> ignoring warnings from their scouts that they're headed
> for a cliff.

Where are you getting these analogies from? :-) They're cool, but just a
product of some sloppy arguing on my behalf. Let's just stick to the
technical details.

[...]
> > Once again, it does not declare the QNames to be identical.
> > It simply uses the QNames to form URI references.
>
> Once again, if it maps lexically distinct QNames to the same URI,
> then RDF declares them identical.

But the QNames aren't in the model. There are no QNames for RDF to say are
identical. If you mean "identical to an RDF parser that is handling the
syntax", then yes, it sees them as identical, but that's obvious. However,
in the *model*, it says nothing about the QNames at all, yet alone going so
far as to say that they're identical. The model is the important bit of
RDF; the syntax is just a means to an end, to get triples.

[...]
> > But all data on the Semantic Web are resources, which may be
> > identified by URI references.
>
> Right, data that is created as, stored as, and exchanged as serialized
> XML instances.

XML RDF instances, right.

[...]
> If there is a lexical (and hence potentially semantic) distinction
> between two QNames in the serialization and my local RDF
> engine knows how to preserve that distinction

If your "local RDF engine" preserves QNames that are in the syntax on
forming the model, then it is a non-conformant RDF parser, i.e. hopelessly
broken.

> [...] then we have failed to maintain the integrity of resource
> identity (and hence the integrity of knowledge) on the SW.

Yeah, if your parser is broken and then you get people to use it, of course
you're going to mess up the Semantic Web.

> [...] Identity cannot vary from agent to agent according to
> localized interpetations of QName to URI mapping!

The RDF specification is *very* clear about that mapping; and there is
absolutely no room for "localized interpretations". If you preserve the
identity of syntax QNames in the model, then your parser is non-conformant.
You either comply or you don't, there is no "two ways about it".

> [...] If identity is not consistent, the SW won't work.
>
> No?

Of course not. If you want to sit there and write a parser which ignores
the RDF specification, all current implementations of the RDF
specification, and a corpus of knowledge built up and tested over many
years, and then try to implement that broken parser, then of course you're
going to harm the Semantic Web. I don't see why anyone would want to do so
though, and I'm sure that they would meet heavy resistance.

[...]
> I might suggest that you don't, if you think that a URI
> identifying a resource bears any more semantics than
> a QName (please don't throw any heavy objects at me ;-)

Nope, of course I don't think that. In fact, I was going to add that fact
as a caveat, but I thought that it was a given. But the Semantic Web uses
URI references, not QNames: the semantics come from interpreting the mass
of data made by triples of those URI references, as handled by parsers and
inference engines etc. That's the beauty of it all.

> [...] RDF has no reliable means of re-serialization
> that guaruntees the same QNames it got on input.

Since it doesn't use QNames in the model, what does it matter? Absolutely
no information is lost because there are no QNames in the model in the
first place. The original serialization is lost, of course, but the
serialization is just a means to an end: it's a method of storing that bit
of model, that bundle of triples.

> Such identifiers serve as constructs for representing semantics.
> The semantics is clearly not inherent in the identifiers themselves,
> no more so than I am contained within my social security number.

Yes, I'm aware of all of this.

> In the case of XML, QNames identify nodes in a tree. In the case
> of RDF, URIs identify nodes in a graph. There's no fundamental
> difference between them whatsoever, insofar as their role as
> identifiers are concerned.

But the Semantic Web uses DLGs... so why are you talking about XML QNames?

[...]
> Therefore it is possible to assign non-ambiguous semantics
> in an XML serialization which becomes ambiguous in an RDF
> graph, and therefore there is potential loss of information and
> unintended introduction of ambiguity into the SW.

Well, the trick is to convert the identifiers in the XML serialization into
a form that is fit for the Semantic Web. In other words, you have to
convert the QNames into URIs that represent those QNames, and then use the
URIs instead. RDF doesn't allow it any other way.

I agree that to a small extent this is a conceptual problem, but there are
no particular use cases that sping to mind that would prove that this is a
sufficiently difficult a mechanism to use in order that we have to make
backwards incompatable changes to RDF. The fact is that it's not impossible
to identify QNames in RDF, just slightly twisted, in that you have an extra
processing step to do.

[...]
> The existence of countless scripts and hacks that scan
> for prefix:name patterns rather than (namespace}name
> pairs [...] Real users are *not* thinking
> (http://purl.org/dc/elements/1.1/)title but rather 'dc:title'!

Oh dear... Those "Real users" should be forced to read the XML Namespace
specification a few times :-)

> Just *try* to pass around DC RDF instances that use
> e.g. 'foo:title' instead of 'dc:title' and listen to the people
> *scream*!

Well, RDF processors shouldn't care too much about it, and they're the
things that have to process this stuff. Semantic Web - a machine
processable Web of data.

> If folks didn't think in terms of not only QNames but
> minimized prefixed QNames when designing SW
> ontologies, then why would e.g. both the DC and
> PRISM specs (and I'm sure many others) recommend
> the use of specific namespace prefixes in the interest
> of consistency of readability!?

Consistency of readability is one thing, consistency of parsing is quite
another. It is not a conformance rule that you have to use certain prefixes
in any given XML language. If it is, then it is in error of the XML
Namespaces specification.

I'm not sure how you can use this point in an argument about consistent
data. If people don't know about the simplest concepts in the XML
Namespaces specification, then a) they shouldn't try to implement it, b)
they may mess their applications up if they do, and c) we should attempt to
educate them wherever possible. To err is human, and some people don't have
the time to read specifications. That does not mean that we have to write
parsers that handle their gross misunderstandings of a W3C recommendation!

> I see no reason why QNames (represented as URIs) should
> not become a primary naming construct for abstract resources
> such as properties in the RDF world.

1) Adding QNames to the syntax such that they convert into QName URIs in
the model would introduce backwards incompatable, and therefore damaging,
features to the RDF specification.
2) We're already using URIs with no particular disadvantages.
3) One can already use QNames in the model thanks to your URI scheme; don't
fix it if it ain't broke.

> QNames are great universal identifiers. We need good URN
> schemes. Let's use QNames as URIs.

Well, I'm not stopping you. No one cares if you invent a new URI scheme to
handle QNames and use that in the current RDF M&S fashion. The world will
come down on you like a pile of bricks if you try to make silly backwards
incompatable changes in RDF for no good reason. But go ahead: try writing a
*new* specification that uses the QNames in the syntax to form URIs in the
model that are of the QN URI variety. Try writing some parsers that can
cope with that. If you manage, then I shall a) be impressed, and b)
congratulate you, and you'll have my full support in doing so. But you
can't just change RDF and expect people to go "oh, so I've got to change
all of my code to cope with this change for no reason? Fair enough".

I think you should be realistic; what do you expect to change? Anything? I
think that it would be pretty incredible if anything did. But instead you
can continue to work through this QN URI scheme idea, and make sure that
people implement it correctly.


> > > [...] The fact that there is *not* an official, standard
> > > URI representation for QNames is what is surprising...
> >
> > Well, no one has really had a use for it, so not really.
>
> You may be right on that point. RDF may very well be the first
> standard to need an explicit QName to URI mapping [...]

I think it is, yes.

[...]
> > I think you'll find that a) FragID syntax is independant of
> > URI scheme and b) the "hack" works with a wide range,
> > indeed a gross majority of URI schemes. ...
> > it's not much of a problem!
>
> I disagree. In various ways for various reasons.
>
> This issue has been discussed elsewhere on this list at great
> length. I won't re-address it here.

Please, at least provide references.

> [...] The XML spec defines a way to achieve a
> consistent representation of structured data. To
> randomize it violates that fundamental goal.

The RDF specification is very, very, clear about the QName to URI reference
mapping, and as such cannot be considered "random" at all.

> The NS spec defines the mechanisms for QName
> distinction -- that distinction being the very goal of
> the NS spec -- and to discard such distinctions is
> a violation of the NS spec.

No, the NS spec is very careful not to limit people in the ways that they
use QNames. It sets out the partition, but doesn't say what you have to do
with those QNames once you have partitioned them.

> [...] But if it *is* an error, then it needs to be addressed
> and (hopefully) fixed.

It's not an error.

[...]
> > > Parentheses and escaping should do just fine [...]
> >
> > Cool.
>
> At least that part is clear ;-)

:-)

I wonder if we could get a reference to this thread from the RDF issues
list [1]? We're discussing "rdfms-uri-substructure" again!

[1] http://www.w3.org/2000/03/rdf-tracking/

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://webns.net/roughterms/> .
:Sean :hasHomepage <http://purl.org/net/sbp/> .
Received on Thursday, 23 August 2001 10:33:59 UTC