- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Mon, 24 Sep 2001 16:20:23 +0100
- To: "Graham Klyne" <GK@ninebynine.org>
- Cc: <uri@w3.org>
> > >All refs to [RFC 2396] include the extension given by [RFC 2732] > > (a) I don't see what significance RFC 2732 has regarding this I18N > debate. Or am I missing something? Isn't that the one that adds '[' and ']' to the allowed characters. > > >IRI decoding algorithm > [...] > >The output MAY be a UTF-8 encoded string. > > (b) If specifications are going to be changed to allow IRIs, then I think > the opportunity should also be grasped to insist that the octet-sequence > encoding of an IRI MUST BE UTF-8. > The point about this MAY is simply that the algorithms doesn't always work :-(. For better or worse there are legacy URIs that are not UTF-8 encoded, e.g. in my example, http://lists.w3.org/Archives/Public/uri/2001Sep/0010.html the URI http://iso-8859-1.example.org/D%FCrst/braces%7B%7D cannot be unencoded to include the u umlaut without knowledge of the iso-8859-1 encoding scheme. IRI's, because of the idempotency, can include any URI. Some of these URIs are the encoding of an IRI without any % escapes other than %25; some of them such as ones with D%FCrst in them are not. Even ones which are the UTF-8 encoding of some IRI might not have that as the intended unencoded form, only the end server knows how to correctly decode a URI. The decode algorithm correctly works for any URI that is UTF-8 encoded. The IRI itself certainly need not be UTF-8 encoded, the rule is that as part of the encoding to URI syntax the IRI is transcoded to UTF-8. A stronger requirement that IRIs are UTF-8 unnecessarily prevents the inclusion of IRIs in non UTF-8 documents. One of the points about my proposal, is that current practice, (which is all the proposed algorithm amounts to), is in fact, very robust, and can work despite multiple character encodings used by servers and multiple character encodings used for the IRIs themselves. This robustness depends on URIs not containing '%' which hence has completely special treatment in my proposal. Jeremy
Received on Monday, 24 September 2001 11:21:01 UTC