RE: Clarification of charmod-uri from Jeremy Carroll on 2002-05-01 (w3c-rdfcore-wg@w3.org from May 2002)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Wed, 1 May 2002 09:47:10 +0100
To: "Aaron Swartz" <me@aaronsw.com>, "RDF Core" <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDOEOFCDAA.jjc@hplb.hpl.hp.com>
> > The standard treatment of URIrefs is to do as little processing as
> > possible. So xml namespaces differ if the uri-ref differs in
> spelling, not
> > intent. In particular:
> > http://example.org/#Andr%c3%a9 and http://example.org/#Andr%C3%A9
> > are different as far as XML Namespaces goes.
> >
> > If we assert that these are both identical to http://example.org/#Andre
> > we need to account for how they are the same under RDF.
>
> I don't see it as our job to assert that they are identical. They are
> clearly different character strings, and that's how we're comparing
> identifiers. No one is asking RDF to conclude that
> http://www.w3.org/TheProject and http://www.w3.org/ are identical.


We should be clarifing which nodes are identical in:


<rdf:RDF
    xmlns:eg="http://example.org/#">
  <eg:Andre rdf:about="http://example.org/#x">
    <eg:foo rdf:resource="http://example.org/#Andr%c3%a9"/>
    <eg:bar rdf:resource="http://example.org/#Andr%C3%A9"/>
  </eg:Andre>
</rdf:RDF>

is the rdf:type of eg:x the same as its <eg:foo> or its <eg:bar> value or
neither.

The WG decision clarified along the "They are clearly different character
strings" route that you advocated for the <eg:foo> and <eg:bar> values.

It is somewhat contrived, but illustrative that the common understanding of
M&S as doing the % encodings before forming the graph is not clear.

Options in the face of this example include:
- decide the example is illegal (e.g. eg:Andre does not match the typed
node production)
- decide that % encoding is upper case
- decide not to do % encoding when forming the graph (what the group chose)
- decide that the example is non-deterministic
- decide to recode %c3%a9 in upper case and that % encode e in upper case
(this is what ARP does)
- decide that the example is not important


None of these is cost free, none is backwardly compatible.


>
> > A less significant reason is showing that preserving the original input
> > characters is mandatory (these are the most useful way to
> display the URI
> > on output).
>
> Mandatory by whom? Is there no %-encoding way to preserve these
> characters?

No, not robustly. To reverse the %-encoding you have to assume UTF-8.
I believe that Nestl(e-accute)(reg) in 8839-1 is also legal UTF-8.

> We're no longer talking about URIs, but instead these
> magical identifiers that can have spaces and accents in them. This is
fine
> for a client or other user applications, but to mandate it for the base
of
> RDF seems to simply be asking for trouble.

> If the i18n group wants to change what URIs are, they should get it
passed
> thru the IETF, not making systems slightly incompatible with each other
one
> spec at a time. I do not feel comfortable making a change to something as
> big as URIs in a group as small as ours.

We are deciding that what RFC 2396 calls "original character sequences" are
more important for us than the %-encoded forms. This is not a change to RFC
2396 nor is it incompatible with it.

This change is already in XML (through erratum 26), it is already in xlink.
It is in many other W3C specs.

The logic of it is that there is a need for an update to RFC 2396 that
recognises this migration; but we, a small group, are a small part of a
bigger picture of change.

Jeremy
Received on Wednesday, 1 May 2002 04:47:21 UTC