RE: Syntax vs Semantics vs XML Schema vs RDF Schema vs QNames vs URIs (was RE: Using urn:publicid: for namespaces) from Patrick.Stickler@nokia.com on 2001-08-15 (www-rdf-interest@w3.org from August 2001)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 15 Aug 2001 08:07:27 +0300
To: sean@mysterylights.com, scranefield@infoscience.otago.ac.nz, www-rdf-interest@w3.org, www-rdf-logic@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114BF84@trebe003.NOE.Nokia.com>
> > BUT that form of resolution/concatenation has been shown to
> > be unreliable and capable of producing ambiguous URIs! It's
> > broken and *must* be replaced by something else.
> 
> I'm really starting to wonder what you're going on about. It
> concatanates QName pairs into URIs, that's it. And when you want to
> refer to a QName, you can do so using the model. I think you're
> blowing this out of proportion somewhat.

If you read my proposal, you would have seen that plain concatenation
('ns' + 'name' -> 'nsname') can produce collisions. 

To repeat one example from my proposal:

--

Claim 4: The current methodology employed by RDF to attempt to create
a resource identity by direct concatenation of namespace
and name does not ensure the preservation of the uniqueness of namespace
qualified names.

E.g. Both of the following valid yet distinct syntactic forms
are mapped to the same semantic resource URI, resulting in an
RDF-internal naming collision:

   <x:varovasti xmlns:x="http://x.com/z#aja">
   -> "http://x.com/z#ajavarovasti"

   <x:rovasti xmlns:x="http://x.com/z#ajava">
   -> "http://x.com/z#ajavarovasti"!

The fact that the above example is contrived does in no way
invalidate the fact that the present RDF methodology is 
unreliable and can result in inintended semantic ambiguity
from distinct syntactic forms.

--

Thus, no matter what scheme you use to derive a URI from 
a namespace + name pair, it must maintain their partitioning
explicitly.

Other concatenation or combinatoric schemes can produce invalid URIs
according to the specification of the URI scheme.

It is true that, so long as RDF URIs are unique, it is not as critical
if those URIs are invalid; but in that case, let's just change the
RDF spec to just define "unique global identifier". If it's a URI,
it should be valid, and even though an RDF engine will treat it as
opaque, should not prevent an application from being able to examine
and utilize it.

Building a Namespace + Name complex QName structure and giving it
an identifier only complicates the situation. Either you have to
write every axiom that mentions ns:name using the complex structure,
or you have to standardize the identifier given to the root node
of the structure -- in which case your right back to where we started
from.

And... it still doesn't solve the literal to URI mapping issue.
 
> > The current "popular" proposal, a'la XML Schema,
> > inserting a '#' character is unnacceptable [...]
> 
> That isn't a proposal at all, it's something that is specific to XML
> Schema, and indeed, they didn't have to just add a "#" in there. They
> could have said that the QName:-
> 
>    {http://www.w3.org/2001/XMLSchema}string
> 
> maps to:-
> 
>    http://www.w3.org/2001/05/blargh#stringthing
> 
> but that would have been a smidge more difficult for people to
> remember than just bunging a "#" in there. 

True! But.. what matters is that there is a single, standardized, explicit,
regular, and non-collisive method for such a mapping!

And, BTW, my proposed rdf:Map element would allow folks to define
such kinds of seemingly arbitrary mappings ;-)

> I'm disappointed that there
> is no RDF available on the W3C site (correct me if I'm wrong) that
> defines the URIs that they've set out, or indeed maps back to the
> QNames, but then anyone implementing XML Schema is going to have to
> read the specification anyway. So yes, there is a problem in that
> there's no machine readable way of getting to those QNames, but it's a
> big leap from there to the spurious suggestion that XML Schema implies
> that all QName pairs in XML should be concatenated with a "#" in the
> middle of them.

Several months ago, relating to this discussion of concatenation issues,
it was proposed (and as I recall fairly well recieved) that RDF might
consider adopting the '#' partitioning character both to syncronize
with XML Schema and to solve the above collision problem. If that has
never actually been officially considered, then my apologies for
being a bit out of touch since then. Still, how XML Schema does
things is neither here nor there; especially since that mapping scheme
is specific to a particular MIME content type.

> > [...] It got RDF started, but cannot carry RDF through to
> > a mature and functional SW.
> 
> I hope you're not one of these people who believe that there are some
> things which can be identified and yet cannot be identified with a
> URI... an axiom of URIs is that they can identify anything. That
> includes a QName; we just don't have a decent model for it yet.

Not at all. I fully believe that URIs can identify anything (though I
may be wrong ;-)

I'm talking here about the method of QName to URI mapping only. It is
far too simple to address the full scope of the issue and also is
broken as it can create collisions. That's not to say that it doesn't
work for alot of cases. Clearly it does. But if we're talking about
scaling up RDF to a pervasive, global, all-encompassing realization
of the Semantic Web, it just doesn't cut it. Sorry, but it doesn't.

I know that alot (most?) of folks working in the SW arena are focusing
on "bigger" issues (from their perspective), but IMO this is a serious
problem that has to be fixed -- one that I've been beating myself
bloody over in trying to actually use RDF to implement large scale
metadata-driven documentation management and distribution systems. It
may not be critical to theorem proving or web crawlers or many other
applications of RDF, but in a context where there are hundreds of
human authors needing to produce metadata (i.e. they need to type
things like 'en' and not long URIs) in a multilingual, multinational,
globally distributed environment with multiple related ontologies and
localized vocabularies, and with systems managing millions of media
objects -- it's a very big problem.

So long as everything is in triples with fully defined URIs, all is
well and everything looks fine. But getting from serialized QNames
to URIs for any valid but arbitrary ontology quickly shows that
there is a real problem.

> > Hello? What? QNames *in* RDF?! I don't think so!
> 
> <rdf:Description>
>    <foaf:name>Sean B. Palmer</foaf:name>
> </rdf:Description>
> 
> Spot the QNames. rdf:Description is a syntax only QName (creates an
> anonymous node), and foaf:name is a QName that is important to the
> model (as a predicate).

Uhmmm... as far as I understand things, there is no mapping from 
'rdf:Description' to any URI because even though that is a QName, it 
is itself not the serialized identity of a resource. I thought that 
distinction was clear -- the QName to URI mapping is done only for the 
QName-defined serialization of the identity of a resource, not for every
QName in a given XML instance.

The fact that an anonymous node is created from the above example
has nothing whatsoever to do with the entire QName to URI mapping issue.

> > QNames are a creature of the SYNTAX ONLY! They have
> > no, and should have no, realization in the set of triples derived
> > from a serialized instance! [...]
> 
> Ah, that's a good point, and it's one of the reasons that XML RDF
> sucks for certain tasks. Just try expressing a predicate that ends
> with a ":" in XML RDF. But that's why I proposed BSWL, and it's why
> SWAG is working on another format, which is just NTriples in XML.

I'm glad you agree that XML RDF is not perfect (what model is) but
again this is totally missing the point!

Serializations of triples are after the fact. If you serialize triples
for consumption by computers, you will *not* be distilling URIs into
QNames, right? And even if a human is creating a serialization of
triples with BSWL or NTriples they will likely be required to use
fully specified URI's, no? So it's a non-issue insofar as those 
"alternate" serialization models are concerned. Please correct me
if I'm wrong here about not using QNames for resource identity in 
either BSWL or NTriples.

> > Resources are identified by *URI*s, [...]
> 
> They are? Aw man, someone should have told me earlier! :-)

Right. But do you then consider a complex structure of an
anonymous node with namespace and name nodes dangling below
it to be a URI?! Sorry, I just don't accept that as being
equivalent.

> > Adopting XML Schema data types in RDF doesn't provide
> > any actual validation, [...]
> 
> Tools will come for datatypes.

But if data type validation is not provided for in the actual
standards specifications, then we will have little to no 
interoperability between those tools. With all due respect,
(and with no disparagement to DAML), DAML is not RDF. Just
because DAML decides to reference XML Schema data types does
not mean that *any* compliant RDF engine will provide XML Schema
based validation of literals. What I want to see happen is data types 
for literals (or at least basic validation mechanisms such as
regular expressions) being addressed by the RDF/RDF Schema standards
themselves.

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Wednesday, 15 August 2001 01:07:32 UTC