- From: Larry Masinter <masinter@adobe.com>
- Date: Fri, 28 Aug 2009 17:39:40 -0700
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- CC: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>, "Roy T. Fielding" <fielding@apache.org>, John C Klensin <klensin@jck.com>
One way to think about what I'm talking about RFC 3987 and the current draft is: IRI --translating--> URI and then (as necessary): URI --parsing--> parsed-URI-component(s) but most deployed browsers actually do IRI --parsing--> parsed-IRI-component(s) and then (as necessary): parsed-IRI-component --translating--> parsed-URI-component where 'translating' might actually be different for different components (hostname, form query parameters). > Yes, that's the principle. But please note that even now, RFC 3987 says: > Systems accepting IRIs MAY convert the ireg-name component of an IRI > as follows (before step 2 above) for schemes known to use domain > names in ireg-name, if the scheme definition does not allow > percent-encoding for ireg-name: I think this should be a MUST rather than a MAY. > On the other hand, I think it would be a huge overkill to require that > every scheme be defined twice (once for URIs and once for IRIs). New schemes should be defined as IRIs if that's applicable. The old schemes mainly need a general update based on the new IRI generic syntax. There are a few special cases, but they should be addressed specially. > Looking at Erik's mail > (http://lists.w3.org/Archives/Public/public-iri/2009Aug/0012.html), > implementations seem to be everything else but consistent. Why not have > them move in the right direction? I think "parse then escape" is more common than "escape then parse" so I think this is the "right direction". > For (2), RFC 3987 > already sins a bit with regards to absolute scheme-independency, and we > can sin a bit more in iri-bis if that's deemed necessary. I think it's just going the whole way, or at least, we should look at what the spec looks like proposing that. At this point, I'm thinking of updating RFC 4395 also http://www.rfc-editor.org/rfc/rfc4395.txt "Guidelines and Registration Procedures for New URI Schemes" to encourage scheme definitions to * be explicit about the applicability or processing methods for Unicode strings (default: not allowed) * be explicit about HTTP-like "operations" like GET and POST (default: not defined) * and starting a review of registered schemes http://www.iana.org/assignments/uri-schemes.html to update any that need IRI definitions. I think for consistency that the IRI document should acknowledge that these are often popularly called "URLs" but that term is used only loosely, and that formal specifications should distinguish between URL, URI, IRI, LEIRI, HREF and the various other non-terminals. The HTML document attempts to be precise in so many places, using a loose term where a precise one is called for seems like it's more appropriate, but I hope to push off that discussion until I have at least rough drafts of the updated IRI document and a new registry doc. Larry -- http://larry.masinter.net
Received on Saturday, 29 August 2009 00:40:34 UTC