Re: update RFC 2396 from Bjoern Hoehrmann on 2002-05-04 (uri@w3.org from May 2002)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 04 May 2002 19:06:53 +0200
To: "Roy T. Fielding" <fielding@apache.org>
Cc: <LMM@acm.org>, <hardie@oakthorn.com>, <uri@w3.org>, "'Tim Berners-Lee'" <timbl@w3.org>
Message-ID: <ig48du0bfvninndelriitqqfh3qjf5gog7@4ax.com>

* Roy T. Fielding wrote:
>On Wednesday, May 1, 2002, at 01:27  PM, Larry Masinter wrote:
>
>> Trying to redefine "URI" as the "same" protocol element
>> leads to insanity, since there's no versioning.
>> The only way of cutting the knot (after several years of
>> discussion) was to be clear that an "IRI" was a different
>> protocol element as a "URI".
>
>I don't understand.  The vast majority of stuff in IRI is simply how
>to display one.  We don't need to include that.  The only thing I want
>to include is the default: %xx means the character encoded as xx in
>UTF-8.  That is already the default for MSIE and should be for other
>browsers as well, and will simplify the specification.

I disagree. While it's the default in MSIE for URIs, the user enters
into the address bar, it's not the default for the vast majority of
%xx encoded octets requested by MSIE, they originate from HTML forms
where MSIE uses the document or user selected character encoding scheme
to generate the octets, hence most %xx encoded octets representing
non-ASCII characters are not part of valid UTF-8 sequences. There is no
facility to define any other encoding than UTF-8, hence applications
assuming UTF-8 encoding are said to fail.

Received on Saturday, 4 May 2002 13:07:41 UTC