Re: Clarification of charmod-uri from Aaron Swartz on 2002-04-30 (w3c-rdfcore-wg@w3.org from April 2002)

From: Aaron Swartz <me@aaronsw.com>
Date: Tue, 30 Apr 2002 11:03:27 -0500
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B8F42A7F.33AEC%me@aaronsw.com>
On 2002-04-29 03:42 PM, "Jeremy Carroll" <jjc@hplb.hpl.hp.com> wrote:

>> 1) Is there some reason why these Unicode characters cannot be %encoded? I
>> thought someone said something to this effect on the telecon, but I didn't
>> catch it. If not, what's the rationale for insisting on a
>> backwards-incompatible change, when the (comparatively) backwards-compatible
>> %encoding works just as well?
> 
> One reason concerns normal form C issues, and how %-encoded URIs get
> displayed.
> 
> In a system that %-encodes URIs for storage and reasoning it is highly
> desirable that they get unencoded for display.

I don't see that as our concern. Since URIs are verbose, I expect that a
system which displays them to an i18n user will likely pick some sort of
compatible abbrevaition, similar to how XML namespaces works for users with
out alphabet.

> As we have seen there are
> multiple ways under unicode of representing characters such as e. Retaining
> these within unicode it is possible to specify and realistically expect
> implementations of the normal form C constraint - i.e. that the unicode
> must be normal form C. This constraint becomes significantly more difficult
> to check (i.e. less something that can realistically be expected of a
> unicode library) if the check is does this %-encoded uri correspond to a
> UTF-8 encoding of a unicode string that is not in NFC.

Ah, so you are worried that an RDF system will have %-encoded URIs entered
into it by humans that are not in NFC? But I thought you said that this
decision did not affect %-encoded URIs, and that they were still legal.

That in itself is a quagmire, since it makes it extremely confusing, if not
impossible to encode %-encodings into URIs.

[...]
> The standard treatment of URIrefs is to do as little processing as
> possible. So xml namespaces differ if the uri-ref differs in spelling, not
> intent. In particular:
> http://example.org/#Andr%c3%a9 and http://example.org/#Andr%C3%A9
> are different as far as XML Namespaces goes.
> 
> If we assert that these are both identical to http://example.org/#Andre
> we need to account for how they are the same under RDF.

I don't see it as our job to assert that they are identical. They are
clearly different character strings, and that's how we're comparing
identifiers. No one is asking RDF to conclude that
http://www.w3.org/TheProject and http://www.w3.org/ are identical.
 
> A less significant reason is showing that preserving the original input
> characters is mandatory (these are the most useful way to display the URI
> on output).

Mandatory by whom? Is there no %-encoding way to preserve these characters?

>> 2) Am I correct in saying that this means that RDF will no longer be using
>> URI-refs to identify Resources? Is this consistent with our charter?
> Misha argues that RDF M&S has already had its meaning "clarified" in this
> way by errata 26 against XML second edition. (I would confess to having
> sense of a non-backwardly compatible clarification, rather like the
> unqualified attributes issue!).
> See: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0012.html

This argument seems specious, at best. RDF says that implementers should use
the "XML convention" to avoid "URIs that are not allowed by [URI]". It says
nothing about following XML instructions for _when_ to do this encoding.

[...]

> I also note that I am influenced by a sense of the inappropriateness of an
> historic limitation of the US phone system (that the eighth bit used to be
> dirty) should limit the functionality available to web users around the
> world. If this has been significant in our voting then perhaps that could
> raise charter issues.

This has not influenced my voting at all. If that were the issue, I would
clearly support the change. My true concern is that we are really changing
what RDF means. We're no longer talking about URIs, but instead these
magical identifiers that can have spaces and accents in them. This is fine
for a client or other user applications, but to mandate it for the base of
RDF seems to simply be asking for trouble.

If the i18n group wants to change what URIs are, they should get it passed
thru the IETF, not making systems slightly incompatible with each other one
spec at a time. I do not feel comfortable making a change to something as
big as URIs in a group as small as ours. I fear many unseen ramifications of
this change (incompatibility with previous encoding schemes, different
encoding mechanisms across specs, etc.) that because of our smaller
visibility (compared to URIs) may not be noticed in time.

I ask the group to reconsider this decision,
-- 
[ "Aaron Swartz" ; <mailto:me@aaronsw.com> ; <http://www.aaronsw.com/> ]
Received on Tuesday, 30 April 2002 12:03:30 UTC