Re: CURIEs vs. QNames from Norman Walsh on 2005-11-09 (public-rdf-in-xhtml-tf@w3.org from November 2005)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Wed, 09 Nov 2005 08:54:33 -0500
To: "Mark Birbeck" <mark.birbeck@x-port.net>
Cc: <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <87zmoeorja.fsf@nwalsh.com>
/ "Mark Birbeck" <mark.birbeck@x-port.net> was heard to say:
| First I have to apologise; I wouldn't normally reply with such pedantry as
| follows,

No worries.

| but since one of my main arguments in favour of CURIEs is that we
| need a way to abbreviate URIs in a manner that has *already* become
| established practice via QNames, then I feel I have to take issue with your
| statements about N3. I'll then go on to the main part of the argument, which
| is whether there could be a misunderstanding if CURIEs were to 'look like'
| QNames.
|
| I wrote:
|> | (This *would be* N3, except it isn't, because N3 is another 
|> example of 
|> | a language that mandates QNames...which sort of makes the 
|> point that 
|> | I've been trying to argue; when a non-XML language like N3 
|> restricts 
|> | the way that 'abbreviated URIs' are formed, to that used in 
|> | XML+Namespaces element and attribute name formation, you know 
|> | something has gone wrong!)
|
| You replied: 
|> That argument doesn't hold for N3. N3 is a serialization 
|> format for RDF.
|
| Yes, that's right...for RDF, not RDF/XML. There is no reason to import
| QNames into N3 since it has nothing to do with XML (just like RDF has
| nothing to do with XML).

Ok. At the highest level, you're right. RDF is just a bunch of triples
where some of the triples are literal values and some are URIs (and
some are blank nodes, but I doubt that's relevant).

I note however that N3 and RDF/XML both appear to assume that the URIs
can be represented as QNames.

In fact, it appears that the triple

  (<http://example.org/>, <http://127.0.0.1/1234>, "foo")

can't be expressed in RDF/XML.

|> RDF has an XML syntax for which N3 is attempting to be an alternate
|> serialization.
|
| No. N3 and RDF/XML are *both* alternate serialisations (to each other) for
| RDF.
|
|> It seems 
|> entirely reasonable in that circumstance that N3 should have 
|> more restrictions than are actually, grammatically necessary in N3.
|
| I can't see why. N3 only represents RDF--it has no relationship to XML
| whatsoever. You could get rid of RDF/XML tomorrow, and both RDF (the
| underlying information) and N3 (one possible serialisation of it) would be
| untouched.

Indeed. I can find no constraint in the N3 grammar[1] that prevents symbols
of the form "x:1234". And I expect most N3 processors build "the right
graph" when they are encountered.

| That's the pedantry out of the way...so, to summarise, N3 *already* shows
| that there is a glaringly obvious need for a way to abbreviate URIs. But
| since there was no readily available mechanism to do this, N3 (in common
| with others) opted to use QNames. This is particularly bad in general, since
| there is no relationship between the URI expressed and an XML+Namespaces
| element or attribute name (as indicated in the TAG report). But it is also
| particularly bad in the specific context of N3, since this syntax doesn't
| even have anything to do with XML.

I think "particularly bad" is a bit pejorative. It's an incomplete
answer because there are some URIs that have no QName shortcut.

| I feel I have established in many ways now, in many emails, both the need
| and the precedent, so hopefully the question now becomes, where do we go
| from here?

To summarize:

1. You want a syntactic shortcut for URIs
2. If you used QNames, you would want to use the QName to URI mapping rule
   "concatenate the namespace name and local name"
3. You aren't satisfied with QNames because with that rule, there are some
   URIs which have no QName.

Right?

| To recap the argument, my proposal uses the same *mechanism* as QNames, but
| produces a wider range of URIs. Your argument is that this is wrong since a
| CURIE and a QName could get mixed up, to which I would say:
|
|    * QNames are already being used (inappropriately) as URI
|      abbreviations, without confusion.

Whether it's inappropriate or not is an open question. My point is
that the reason it isn't confusing is because the QNames so used are
always valid QNames.

|      This is because when the
|      context is a 'real' QName (an element or attribute name with a
|     'namespace part') the local name and namespace part are never
|      actually combined in the way that has become common practice in
|      RDF/XML, XForms, N3, and so on. In other words, whether x:y maps
|      to { http://x#, y } or http://x#y is already given by context;
|
|    * any situation where a CURIE is valid, a QName (in the *proper*
|      sense) will also be valid, and any situation where a QName
|      *only* is valid, then only CURIEs that conform to a QName will
|      be valid (just in the same way that only numbers that conform to
|      the positive integer pattern match correctly for a positive
|      integer).
|
| As I've said before, the horse has already bolted when it comes to the use
| of QNames to represent URIs. My proposal does not seek to make the problem
| worse by polluting the *genuine* QName space with relaxed rules for the
| local-name part; rather it proposes a new way of representing URIs in
| abbreviated form, using the widely understood *mechanism* popularised by
| QNames.

I understood that one part of the proposal was to use *a different mechanism*.
The mechanism by which a QName can be decomposed into a namespace name and
a local name is as follows:

 1. Take the part before the colon, call that the prefix.
 2. Look in the in-scope namespaces for the namespace binding for that prefix.
    It is an error if no such binding exists.
 3. The namespace-name and local name for this QName are the namespace URI
    discovered in step 2 and the part after the colon in the QName.

The way a single URI is composed from these two parts is undefined.
Often simple concatenation is used, but other ways exist.

| One way out of this would be to define the *mechanism* (a namespace prefix,
| followed by a colon, followed by a local part)

That's not a mechanism, that's a lexical form.

| as some sort of standalone thing. And then to layer onto that, what
| one does with the bits you end up with after parsing the
| abbreviation:
|
|                  /         x:y
|                  |          |
|  The mechanism: <           |
|                  |          |
|                  \  { http://x#, y }
|                     /         |    \
|                    /          |     \
| QName:      { http://x#, y }  |      \
|                               |       \
| XPath function:       { http://x#, y } \
|                                         \
| URI:                                  http:x#y
|
| The mechanism just converts prefixes and local parts to an 'object'.
| QNames-proper leave that object intact and make use of it, as do XPath
| functions, whilst CURIEs would add another processing step and convert that
| object to a URI. I feel this is *clearer* than current practice, since now
| 'QNames' really are QNames.

I think the definition of QName is well established.

I have two concerns:

1. As I said, I thought part of the proposal was to use a different
   mechanism to find the URI associated with a prefix in a CURIE (in
   an XML document). If, in fact, the prefix is identified by looking
   in the in-scope namespaces, one of my concerns is substantially
   reduced. If an alternate mechanism is employed, such as looking for
   <ns prefix="x" uri="http://x"/> elements elsewhere in the document,
   then I remain deeply concerned.

   Such an alternate mechanism makes the interpretation of x:foo in
   this document ambiguous:

      <example xmlns:x="http://somehost/">
        <ns prefix="x" uri="http://otherhost/"/>
        <body ref="x:foo"/>
      </example>

   And I just don't think that's an acceptable state of affairs.

2. Allowing lexical representations of the form "x:1234" is going to
   cause niave users to believe that "x:1234" is a valid QName and
   this is going to come back to bite them in unpleasant ways when
   they try to use <x:1234/>.

                                        Be seeing you,
                                          norm

[1] http://www.w3.org/2000/10/swap/grammar/n3-report.html

-- 
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
Received on Wednesday, 9 November 2005 13:54:33 UTC