W3C home > Mailing lists > Public > uri@w3.org > March 2004

RE: Are IDNs allowed in http IRIs?

From: Larry Masinter <LMM@acm.org>
Date: Fri, 19 Mar 2004 09:20:38 -0800
To: "'Michel Suignard'" <michelsu@windows.microsoft.com>, public-iri@w3.org, uri@w3.org
Message-id: <0HUU00EHY2UIRK@mailsj-v1.corp.adobe.com>

It seems like a pretty big change to the IRI concept to have
IRI -> URI transformations use scheme-specific knowledge.
Formerly, IRI -> URI transformation was specified as scheme
independent.

I understand that this seems necessary because of IDN, but
it's a big concern.

And to use "SHOULD" would leave us subject to
the indeterminate knowledge of which method is going to
be used.

If you believe that IRI->URI needs to be scheme specific,
then is it really tractable to define IRI as a generic concept?
Do we need a separate spec for "http:", "mailto:", "ftp:" IRIs,
where each specifies the punycode vs. hex-encoding of the
various parts? For example, a 'data' IRI might need a separate
specification for the default MIME type (since text/plain
defaults US-ASCII). And an IRI URN might also have a
different mapping, etc.

Larry


> 
> 
> Adam, I think you have a valid point, I would however make a 
> simpler suggestion, which is two fold:
> 
> - introduce the concept of IRI used as presentation element 
> of URI protocol element. In that sense 
> http://josť.example.net/ is a presentation element for the 
> following protocol element http://xn--jos-dma.example.net/ 
> and as you noted http://jos%C3%A9.example.net/ is not a 
> correct URI (per RFC 2616 referring to host itself defined in 
> RFC 2396). I have suggested text in that sense to the IRI 
> main editor (Martin). Having the concept of presentation 
> element validates http IRIs which exist de facto, whatever we 
> like it or not.
> 
> - add text in the IRI spec saying the following:
> <<
> When an IRI is converted to a URI, the conversion SHOULD use 
> scheme-specific knowledge to convert appropriate components 
> where the scheme syntax prevents the usage of percent-encoded 
> text into such components. Lack of scheme-specific knowledge 
> (or failure to use it) can cause valid IRIs to be converted 
> to invalid URIs that contain percent-encoded non-ASCII text 
> where they are not permitted.
> >>
> It is my opinion that anybody in its right mind would 
> implement IRI to URI mapping considering all the schemes 
> where 'host' is used and map accordingly (ie use Punycode). 
> My text avoids direct reference to ACE which is in my opinion 
> unnecessary in the IRI spec and also makes the suggestion to 
> use scheme aware mapping much stronger (SHOULD instead of 
> MAY). It is a SHOULD instead of a MUST simply because a 
> scheme may be updated in the future, making the scheme 
> awareness eventually not necessary.
> 
> Michel
Received on Friday, 19 March 2004 12:21:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:32 GMT