- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 29 May 2002 17:26:40 +0900
- To: "Roy T. Fielding" <fielding@apache.org>, <LMM@acm.org>
- Cc: <hardie@oakthorn.com>, <uri@w3.org>, "'Tim Berners-Lee'" <timbl@w3.org>
Hello Roy, others, At 13:53 02/05/01 -0700, Roy T. Fielding wrote: >On Wednesday, May 1, 2002, at 01:27 PM, Larry Masinter wrote: > >>Trying to redefine "URI" as the "same" protocol element >>leads to insanity, since there's no versioning. >>The only way of cutting the knot (after several years of >>discussion) was to be clear that an "IRI" was a different >>protocol element as a "URI". > >I don't understand. The vast majority of stuff in IRI is simply how >to display one. Yes and no. XML, XML Schema, and XLink allow to include IRIs directly in the XML. From a protocol point, that may still be 'display', but many people will see it as more than just display. >We don't need to include that. The only thing I want >to include is the default: %xx means the character encoded as xx in >UTF-8. That is already the default for MSIE and should be for other >browsers as well, and will simplify the specification. It is the default when converting from IRIs to URIs. But it is not the default for an arbitrary %hh that is out there. I would be extremely delighted if we could just go and say "it's UTF-8, and nothing else". Unfortunately, that's not possible. But I think it's a very good idea to make clear in the revision that UTF-8 is where things are moving, rather than just the current "For example, UTF-8 [UTF-8] defines a mapping from sequences of octets to sequences of characters in the repertoire of ISO 10646." >>IRI would recycle us at Proposed. I'm opposed to >>including IRI in the URI draft if we're trying to >>move URI to Standard. > >The deciding factor on when a change causes a reversion of status is >still very unclear to me even after all of these years. All I know is >that this clarifies how the server component should generate and >interpret encoded URI characters outside of the iso-latin-1 subset >of utf-8. Please be careful. This should be 'outside of the us-ascii subset of utf-8'. iso-latin-1 and utf-8 are not compatible. >In other words, it doesn't change the parsers. It is >certainly far less of a change than introducing [IPv6] notation >within the authority component. While we are at it, what about changes due to Internationalized Domain Names? http://search.ietf.org/internet-drafts/draft-ietf-idn-uri-01.txt proposes to lift the restriction that %hh cannot be used in the host name part. [Currently, only %80 and higher are allowed, but I plan to change that because it would really be silly to keep it that way.] >>The IRI draft still has several unresolved issues, >>which I hope can be resolved quickly. They may be >>obscure, but still can't be left open, e.g., RTL languages >>in IRIs: if they're allowed, what is the bidi algorithm >>to be used in rendering them? > >Those kinds of things should still be specified elsewhere. I agree. Regards, Martin.
Received on Wednesday, 29 May 2002 06:13:38 UTC