Re: [charmodReview-17] replacing all URIs with IRIs from Stefan Eissing on 2002-05-28 (www-tag@w3.org from May 2002)

From: Stefan Eissing <stefan.eissing@greenbytes.de>
Date: Tue, 28 May 2002 10:52:41 +0200
To: Aaron Swartz <me@aaronsw.com>
Cc: Misha.Wolf@reuters.com, www-tag@w3.org
Message-Id: <450CE986-7218-11D6-AB6D-00039384827E@greenbytes.de>

Am Sonntag den, 26. Mai 2002, um 20:48, schrieb Aaron Swartz:

> re: n-triples supporting character escapes. Yes, my point is that 
> the software lags far behind the specs.
>
> On Saturday, May 25, 2002, at 05:51  AM, Misha.Wolf@reuters.com wrote:
>> RFC 2396, in specifying the use of %HH escaping, does not confine its
>> use to UTF-8.  There are plenty of URIs out there which use %HH to
>> escape other character encodings.  Once you have a %HH-escaped URI,
>> there is no way back, unless you know how it was created.  If an RDF
>> database contains some %HH-escaped URIs, how can anyone know whether
>> they arrived %HH-escaped, or whether the %HH-escaping was applied just
>> before their insertion in the database?
>
> I've heard some rumblings about updating RFC2396 to require UTF-8...
>
> But even so, why does it matter? The worst effect I can see is 
> that some (broken) URIs are displayed a little funny. Are software 
> going to be peeking into these URIs for some reason?

Think for example about a WebDAV file system. The fs driver needs
to convert back and forth between local filenames and server uris.

Think about a HTTP server sitting on a file system. Apache 1.3.x on
a windows box will convert euro signs in filenames to %80. Which
is neither ISO-8859-x nor UTF-8.

In order for WebDAV server and the fs driver to work together, they have
to agree on a charset for the URI encoding. Since charset parameters in
URIs are messy, UTF-8 seems the best choice.

//Stefan

Received on Tuesday, 28 May 2002 04:53:17 UTC