- From: Larry Masinter <masinter@adobe.com>
- Date: Mon, 24 Aug 2009 19:27:18 -0700
- To: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
I've continued to mull over the contradictory requirements for IDNA and IRI parsing. 1) IDNA requires (requests, demands, whatever) that there be no way in which a %xx percent-hex-encoded version of an Internationalized Domain Name ever be presented to a DNS resolver. 2) Traditionally, though, IRIs have been defined as requiring a scheme-independent (and syntax-independent) translation from IRI (or IRI-like-thing) to URI. 3) URI schemes have host names in many places, not just one: mailto:person@host, ftp://user@host/path, http://host1/path?location=http://host2/path2 I don't think these three things are compatible. If IRIs are defined by mapping to URIs using (2), then Internationalized Domain Names in different schemes (3) will translate to percent-hex-encoding domain names in their corresponding URIs, violating the requirement for (2). I can't see any way around (1) or (3), so this leaves me with the uncomfortable choice of abandoning (2), performing major violence on the IRI spec. So here's a swipe at how this might work (please don't shoot me yet): NO LONGER define an IRI by a generic IRI -> URI mapping. INSTEAD, IRI parsing is *scheme specific*. "Internationalized" (IRI) versions of URI schemes are defined as: For each URI scheme, there is a corresponding IRI scheme. The grammar for the IRI scheme *MUST* be exactly the same as the grammar for the URI scheme of the same name, except that (a) every syntactic component in the URI scheme that allows "unreserved" characters from the URI spec should, in the IRI form, allow "Unreserved" characters from the IRI repertoire. In general, the mapping for handling IRIs and interpreting them CAN be defined using a generic IRIstring to URIstring component using percent-encoding, with the exception that host names are translated to the right IDNa format string. URI schemes do not AUTOMATICALLY get equivalent IRI schemes. So, data:, cid:, mid:, tag:, etc. etc. do *Not* have IRI equivalents automatically (if needed, someone can support them.) Instead, we define IRI versions of "http:" and "https:" and (maybe) "file:" and "ftp:" and "mailto:" using the new generic IRI definition. This is painful, of course, but at least it seems to be more consistent with what's implemented. Larry -- http://larry.masinter.net
Received on Tuesday, 25 August 2009 02:27:55 UTC