- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 22 Mar 2004 11:59:43 -0500
- To: Larry Masinter <LMM@acm.org>, "'Michel Suignard'" <michelsu@windows.microsoft.com>, public-iri@w3.org, uri@w3.org
At 09:20 04/03/19 -0800, Larry Masinter wrote: >It seems like a pretty big change to the IRI concept to have >IRI -> URI transformations use scheme-specific knowledge. >Formerly, IRI -> URI transformation was specified as scheme >independent. Yes. I definitely want to keep it that way as much as possible. At most, there should be a single bit per scheme that says whether punycode should be applied to the 'host' part. But hopefully, even that should be transitory. In no way do I want this to be a slippery slope. >I understand that this seems necessary because of IDN, but >it's a big concern. > >And to use "SHOULD" would leave us subject to >the indeterminate knowledge of which method is going to >be used. Yes. But to a large extent, this is actually an implementation issue. For all those cases where IRIs are resolved directly, it just boils down to whether you implement the IDN->Punycode conversion before passing 'host' to your dns resolver code, or whether you beef up the code that calls the dns resolver with IDN. That should be left to the application. Overall, there are three ways to use IRIs: 1) direct resolution (e.g. in a browser) 2) use just as an identifier (e.g. as a namespace) 3) conversion to URI (e.g. for proxies,...) The question really only turns up in case 3). In the IRI draft, everything is described in terms of 3), because this helps a lot in writing the spec, it avoids copying all the details about what an URI is, how the generic syntax works, and so on. In practice, straghtforward, full conversion is somewhat less important (which does not mean that it should not be defined). >If you believe that IRI->URI needs to be scheme specific, >then is it really tractable to define IRI as a generic concept? >Do we need a separate spec for "http:", "mailto:", "ftp:" IRIs, >where each specifies the punycode vs. hex-encoding of the >various parts? I very much hope not. As I have documented in the IRI draft, if you want to try and make everything just 'work', the scheme is not the limit. Host names can appear as query parameters in IRIs, and only (if necessary) an update of the query interface (and script behind it) can actually let users use IRI functionality in that case. Look for the 'validator' example in the draft. >For example, a 'data' IRI might need a separate >specification for the default MIME type (since text/plain >defaults US-ASCII). Maybe yes. For the moment, one would have to write: data:text/plain;charset=utf-8,SOME%20DATA (where SOME and DATA would be non-ASCII characters). But how much is data: actually used just for text? >And an IRI URN might also have a >different mapping, etc. As mentioned in the IRI draft, URNs are already defined to use UTF-8 if there are some non-ASCII characters. So they work together perfectly with IRIs. Regards, Martin.
Received on Monday, 22 March 2004 15:36:04 UTC