Re: update RFC 2396 from Martin Duerst on 2002-05-29 (uri@w3.org from May 2002)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 29 May 2002 17:26:40 +0900
To: "Roy T. Fielding" <fielding@apache.org>, <LMM@acm.org>
Cc: <hardie@oakthorn.com>, <uri@w3.org>, "'Tim Berners-Lee'" <timbl@w3.org>
Message-Id: <4.2.0.58.J.20020529171551.0273df78@localhost>

Hello Roy, others,

At 13:53 02/05/01 -0700, Roy T. Fielding wrote:
>On Wednesday, May 1, 2002, at 01:27  PM, Larry Masinter wrote:
>
>>Trying to redefine "URI" as the "same" protocol element
>>leads to insanity, since there's no versioning.
>>The only way of cutting the knot (after several years of
>>discussion) was to be clear that an "IRI" was a different
>>protocol element as a "URI".
>
>I don't understand.  The vast majority of stuff in IRI is simply how
>to display one.

Yes and no. XML, XML Schema, and XLink allow to include IRIs
directly in the XML. From a protocol point, that may still
be 'display', but many people will see it as more than just
display.

>We don't need to include that.  The only thing I want
>to include is the default: %xx means the character encoded as xx in
>UTF-8.  That is already the default for MSIE and should be for other
>browsers as well, and will simplify the specification.

It is the default when converting from IRIs to URIs.
But it is not the default for an arbitrary %hh that is
out there.

I would be extremely delighted if we could just go and say
"it's UTF-8, and nothing else". Unfortunately, that's not
possible. But I think it's a very good idea to make clear
in the revision that UTF-8 is where things are moving,
rather than just the current

"For example, UTF-8 [UTF-8] defines a mapping from sequences
of octets to sequences of characters in the repertoire of ISO 10646."

>>IRI would recycle us at Proposed. I'm opposed to
>>including IRI in the URI draft if we're trying to
>>move URI to Standard.
>
>The deciding factor on when a change causes a reversion of status is
>still very unclear to me even after all of these years.  All I know is
>that this clarifies how the server component should generate and
>interpret encoded URI characters outside of the iso-latin-1 subset
>of utf-8.

Please be careful. This should be 'outside of the us-ascii
subset of utf-8'. iso-latin-1 and utf-8 are not compatible.

>In other words, it doesn't change the parsers.  It is
>certainly far less of a change than introducing [IPv6] notation
>within the authority component.

While we are at it, what about changes due to Internationalized
Domain Names?
http://search.ietf.org/internet-drafts/draft-ietf-idn-uri-01.txt
proposes to lift the restriction that %hh cannot be used in the
host name part. [Currently, only %80 and higher are allowed,
but I plan to change that because it would really be silly to
keep it that way.]

>>The IRI draft still has several unresolved issues,
>>which I hope can be resolved quickly. They may be
>>obscure, but still can't be left open, e.g., RTL languages
>>in IRIs: if they're allowed, what is the bidi algorithm
>>to be used in rendering them?
>
>Those kinds of things should still be specified elsewhere.

I agree.

Regards,    Martin.

Received on Wednesday, 29 May 2002 06:13:38 UTC