- From: Erik van der Poel <erikv@google.com>
- Date: Sun, 1 Nov 2009 07:41:25 -0800
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Cc: Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Thanks for the new iri-bis-07 draft. Many of the changes are in the right direction. It's great that there are detailed steps for conversion between IRIs and URIs (in both directions), but to ensure interoperability (while maintaining security), we need to know how to convert the domain name part of a URI into a DNS packet (or other name lookup protocol). We also need to know how to convert the domain name to the HTTP Host: header. I suppose the HTTP-specific rules should be specified in the HTTP spec(s), but we probably don't want to put DNS-specific rules into the main DNS spec(s), do we? In particular, I'm thinking about the recommendations and rules regarding such things as %2E (%-encoded dot). Although we probably want to recommend "pure" IRIs and URIs (to content producers), we will find mixtures of %-encoded and not-%-encoded text in the real world. We probably need to be a bit more explicit about rules and recommendations for this in the URI <-> IRI conversions (in both directions). In the IRI to URI conversion steps, we now parse the IRI before performing any Punycoding and %-encoding. This matches current implementations. However, I believe we need the analogous change in the URI to IRI conversion steps. I.e. we need to parse the URI and then use a single character encoding (charset) for each URI component (mainly /path and ?query). The current draft says "Re-percent-encode any octet produced in step 2 that is not part of a strictly legal UTF-8 octet sequence." This would break some URIs, since it specifies a per-octet rule rather than the per-component rule. In the IRI to URI conversion, we only have one charset (the "document" charset), but in the URI to IRI conversion, we potentially have more than one charset (e.g. /path is UTF-8 and ?query is GB2312). Such mixtures are rare, and content producers should be warned not to use them, but implementers need to know how to process such exceptions. Erik On Thu, Oct 29, 2009 at 12:18 AM, "Martin J. Dürst" <duerst@it.aoyama.ac.jp> wrote: > On 2009/10/29 10:20, Larry Masinter wrote: >> >> Due to some personal difficulties, the split of the document >> into three parts (parsing, domain names, BCP on character >> handling, BIDI, etc.) didn't happen. However, Martin did >> heroically get a new draft out based on some if the >> interim work. > > I admit that I got a new draft out, but I have to strongly deny > "heroically". Most of the changes are from Larry, and the only thing I did > was to tweak a few things where I had opinions that differed somewhat from > Larry, and to submit a draft before the deadline just so that we have > something in the repository. > > Anyway, please have a look and comment! > > Regards, Martin. > > -- > #-# Martin J. Dürst, Professor, Aoyama Gakuin University > #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp > >
Received on Sunday, 1 November 2009 15:41:59 UTC