- From: Aaron Swartz <me@aaronsw.com>
- Date: Tue, 30 Apr 2002 11:03:27 -0500
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, RDF Core <w3c-rdfcore-wg@w3.org>
On 2002-04-29 03:42 PM, "Jeremy Carroll" <jjc@hplb.hpl.hp.com> wrote: >> 1) Is there some reason why these Unicode characters cannot be %encoded? I >> thought someone said something to this effect on the telecon, but I didn't >> catch it. If not, what's the rationale for insisting on a >> backwards-incompatible change, when the (comparatively) backwards-compatible >> %encoding works just as well? > > One reason concerns normal form C issues, and how %-encoded URIs get > displayed. > > In a system that %-encodes URIs for storage and reasoning it is highly > desirable that they get unencoded for display. I don't see that as our concern. Since URIs are verbose, I expect that a system which displays them to an i18n user will likely pick some sort of compatible abbrevaition, similar to how XML namespaces works for users with out alphabet. > As we have seen there are > multiple ways under unicode of representing characters such as e. Retaining > these within unicode it is possible to specify and realistically expect > implementations of the normal form C constraint - i.e. that the unicode > must be normal form C. This constraint becomes significantly more difficult > to check (i.e. less something that can realistically be expected of a > unicode library) if the check is does this %-encoded uri correspond to a > UTF-8 encoding of a unicode string that is not in NFC. Ah, so you are worried that an RDF system will have %-encoded URIs entered into it by humans that are not in NFC? But I thought you said that this decision did not affect %-encoded URIs, and that they were still legal. That in itself is a quagmire, since it makes it extremely confusing, if not impossible to encode %-encodings into URIs. [...] > The standard treatment of URIrefs is to do as little processing as > possible. So xml namespaces differ if the uri-ref differs in spelling, not > intent. In particular: > http://example.org/#Andr%c3%a9 and http://example.org/#Andr%C3%A9 > are different as far as XML Namespaces goes. > > If we assert that these are both identical to http://example.org/#Andre > we need to account for how they are the same under RDF. I don't see it as our job to assert that they are identical. They are clearly different character strings, and that's how we're comparing identifiers. No one is asking RDF to conclude that http://www.w3.org/TheProject and http://www.w3.org/ are identical. > A less significant reason is showing that preserving the original input > characters is mandatory (these are the most useful way to display the URI > on output). Mandatory by whom? Is there no %-encoding way to preserve these characters? >> 2) Am I correct in saying that this means that RDF will no longer be using >> URI-refs to identify Resources? Is this consistent with our charter? > Misha argues that RDF M&S has already had its meaning "clarified" in this > way by errata 26 against XML second edition. (I would confess to having > sense of a non-backwardly compatible clarification, rather like the > unqualified attributes issue!). > See: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0012.html This argument seems specious, at best. RDF says that implementers should use the "XML convention" to avoid "URIs that are not allowed by [URI]". It says nothing about following XML instructions for _when_ to do this encoding. [...] > I also note that I am influenced by a sense of the inappropriateness of an > historic limitation of the US phone system (that the eighth bit used to be > dirty) should limit the functionality available to web users around the > world. If this has been significant in our voting then perhaps that could > raise charter issues. This has not influenced my voting at all. If that were the issue, I would clearly support the change. My true concern is that we are really changing what RDF means. We're no longer talking about URIs, but instead these magical identifiers that can have spaces and accents in them. This is fine for a client or other user applications, but to mandate it for the base of RDF seems to simply be asking for trouble. If the i18n group wants to change what URIs are, they should get it passed thru the IETF, not making systems slightly incompatible with each other one spec at a time. I do not feel comfortable making a change to something as big as URIs in a group as small as ours. I fear many unseen ramifications of this change (incompatibility with previous encoding schemes, different encoding mechanisms across specs, etc.) that because of our smaller visibility (compared to URIs) may not be noticed in time. I ask the group to reconsider this decision, -- [ "Aaron Swartz" ; <mailto:me@aaronsw.com> ; <http://www.aaronsw.com/> ]
Received on Tuesday, 30 April 2002 12:03:30 UTC