- From: Erik van der Poel <erikv@google.com>
- Date: Tue, 15 Sep 2009 08:43:08 -0700
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Cc: Larry Masinter <masinter@adobe.com>, "Roy T. Fielding" <fielding@gbiv.com>, "Henry S. Thompson" <ht@cogsci.ed.ac.uk>, "tag@w3.org" <tag@w3.org>, "public-iri@w3.org" <public-iri@w3.org>, Michel SUIGNARD <Michel@suignard.com>
Yes, a relative must definitely be parsed before absolutizing. Two backslashes after http: are also treated as (forward) slashes by major browsers. Erik On Mon, Sep 14, 2009 at 8:32 PM, "Martin J. Dürst" <duerst@it.aoyama.ac.jp> wrote: > [I have added public-iri@w3.org to the cc list.] > > On 2009/09/03 9:12, Larry Masinter wrote: >> >> This took a while, but here's the next cut at recasting IRIs and dealing >> with "web address": >> >> >> http://larry.masinter.net/iribis-hack.html >> http://larry.masinter.net/iribis-hack.txt >> http://larry.masinter.net/iribis-hack.xml > >> >> http://tools.ietf.org/rfcdiff?url1=draft-duerst-iri-bis.txt&url2=http://larry.masinter.net/iribis-hack.txt > > > I have read through this new draft, trying to concentrate on the changed > pieces. > > Overall, I'm scared about the tendency to use MUST without much more careful > examination. > > I have nothing against *allowing* scheme-specific short-cuts, optimizations, > or short-time backwards compatibility variants, but in the long term, UTF-8 > is much more important than punycode, and scheme-independent processing > isn't something to be thrown away easily. > > But I'm quite confident that such a result can be obtained with a new > version of the draft which is much closer to the current -06.txt than the > one discussed here. > > In some more detail: > > Title: I strongly suggest removing "URI" from the title and limit the > current effort on the "allow scheme-specific conversion" part and the > "LEIRI/Web address" part. These alone are quite serious, and we can always > make another rewrite effort once these have been addressed really > successfully, although I can't see the need for getting too much involved in > URIs at all at the moment. > > Abstract: "TO ALLOW RECONCILIATION WITH CURRENT PRACTICE": This is too > strong. At least change to "SOME CURRENT PRACTICE", as there are definitely > implementations that can handle %-encoding in regnames, and there will be > more as time goes on. (see Roy's point about the Host header in HTTP). > > Abstract, shortest para: 'addition' ... 'additional': reword to avoid > repetition. > > Document structure: Section 5.5 is the wrong place for LEIRIs and friends > (I'll call these legacy addresses from now on). What we need is a short > notice in section 5 (Normalization/Comparison) about legacy addresses, but > legacy addresses in and by themselves need a separate section (the more I > think about it, the more my conclusion is that an appendix is the best > place). > > Introduction: "increasing numbers of protocols" -> "an increasing number of > protocols" (one number, many protocols) (that may have been in there for > ages) > > Definitions: "parsed IRI component": Don't start a definition with > "similarly". (definitions should be reasonably usable outside of context) > > 3., before 3.1: This clearly needs more text talking about the overall > choices and procedures. > > 3.1 "Convert to UCS" -> "Converting to UCS" (some other titles have the same > problem; verbs don't work well in titles) > > 3.1, first para: Remove "or octet stream...". Of course the "sequence of > Unicode characters" will be represented somehow, but that's not relevant > here. > > 3.12 para 2: "benormalized" -> "be normalized" > > 3.2, para 1: "IRI. this" -> "IRI. This" > > 3.2, para 1: Is the intent to say that for relative URIs, they should be > absolutized first, and then parsed? If yes, then say so. If no, say what > else. I'm absolutely not sure that this will work; we have to very carefully > check all kinds of interactions (relative -> absolute does some parsing as > far as I understand, and HTML5 tries to convert '\' to '/' in paths, which > probably also interacts. > > 3.2, para 2: What about unknown schemes? Simply give up, or what? > > 3.2, para 3: Needs much more care and detail, and can't stay a Note. > > 3.2, para 4: "Subseqent processing rules may be used to define other > syntactic components.": What exactly is this supposed to mean??? > > 3.3, para 3 (NOTE): Why is a MAY harmful? IRIs are well-defined, and we have > to allow implementations to process only valid ones, and not other garbage. > "The non-printable characters should be stripped by most software, so by the > time you get here...": This reads like a "survival of the fittest" for > control characters. > > 3.3: "Hex encode" -> %-encode (That's what both RFC 3986 and 3987 have used, > and even if many people (incl. me) don't like it, there's no reason to > change it just to create even more confusion. > > 3.4, para 1: Again an unjustified MUST. There are implementations that don't > do this, for good reasons, and they work and shouldn't be made > nonconformant. Also, we have to work on what to do for IDNA2008 here. > > 3.4, para 2: If ToASCII fails, then it fails. End of story. That's another > reason why converting to %-encoding makes sense; IRIs/URIs cannot and > shouldn't be concerned with the details of the various namespaces that they > contain or grandfather. > > 3.4, Note 1: "The server side implementation would be responsible": "would > be" -> "is". > > 3.4, Note 2: What about e.g. http://r%C3%A9sum%C3%A9.example.org in an IRI? > Will that get converted to punycode, or not? > > 3.4, Note 3: This needs to go somewhere else, it doesn't fit here. > > 3.5: This is webaddress-specific, needs to be moved. > > 3.6: Now we suddenly have a SHOULD. Does this trump all the MUSTs in the > details, or what. > > 3.7.1, last example: There's some inconsistency re. "natto" (maybe from a > long time ago) > > > 7.: Clarifying that URI schemes are also IRI schemes is a good idea. But > this does it the wrong way: It separates URI schemes and IRI schemes, and > claims that only four schemes (ftp, http, https, impa) can be used with IRIs > when actually there are quite a few more. (what was the criterion for > obtaining the above small list?) > > > That's what I have for the moment. > > > Regards, Martin. > > -- > #-# Martin J. Dürst, Professor, Aoyama Gakuin University > #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp >
Received on Tuesday, 15 September 2009 15:43:55 UTC