Re: I18N Concensus - Generic Syntax Document

Roy T. Fielding (fielding@kiwi.ICS.UCI.EDU)
Fri, 07 Mar 1997 07:28:58 -0800


To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
Cc: URI List <uri@bunyip.com>
Subject: Re: I18N Concensus - Generic Syntax Document 
In-Reply-To: Your message of "Fri, 07 Mar 1997 14:50:36 +0100."
             <Pine.SUN.3.95q.970307134328.245D-100000@enoshima> 
Date: Fri, 07 Mar 1997 07:28:58 -0800
From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
Message-Id:  <9703070729.aa02583@paris.ics.uci.edu>

>> >+ It is recommended that UTF-8 [RFC 2044] be used to represent characters
>> >+ with octets in URLs, wherever possible.
>> >
>> >+ For schemes where no single character->octet encoding is specified,
>> >+ a gradual transition to UTF-8 can be made by servers make resources
>> >+ available with UTF-8 names on their own, on a per-server or a
>> >+ per-resource basis. Schemes and mechanisms that use a well-
>> >+ defined character->octet encoding which is however not UTF-8 should
>> >+ define the mapping between this encoding and UTF-8, because generic
>> >+ URL software is unlikely to be aware of and to be able to handle
>> >+ such specific conventions.
>> 
>> Here is where you lose me.
>
>Don't worry. I hope we will have you back soon again :-).
>
>> I have no desire to add a UTF-8 character
>> mapping table to our server.
>
>There is no need to do so. The above is only a *recommendation*.

Sorry, I misread the paragraph.  It would be clearer to say

   URL creation mechanisms that generate the URL from a source which
   is not restricted to a single character->octet encoding are
   encouraged, but not required, to transition resource names toward
   using UTF-8 exclusively.

   URL creation mechanisms that generate the URL from a source which 
   is restricted to a single character->octet encoding should use UTF-8
   exclusively.  If the source encoding is not UTF-8, then a mapping
   between the source encoding and UTF-8 should be used.

And please cut the self-righteous crap in your replies.  I am fully aware
of why people want to localize their URLs, and I am in a better position
to know what the implementation issues are when doing filename<->URL
mapping.  I have yet to see a memory+time efficient mapping from
arbitrary charset to UTF-8, and I have a lot more faith in standards
based on running code than on supposition.

.....Roy