W3C home > Mailing lists > Public > public-iri@w3.org > November 2009

Re: Using Punicode for host names in IRI -> URI translation

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 23 Nov 2009 20:05:34 +0900
Message-ID: <4B0A6C7E.30801@it.aoyama.ac.jp>
To: Erik van der Poel <erikv@google.com>
CC: Shawn Steele <Shawn.Steele@microsoft.com>, Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>, Pete Resnick <presnick@qualcomm.com>, Ted Hardie <ted.ietf@gmail.com>
Hello Erik,

I agree that for HTTP proxies, and for the Host: header in the HTTP 
protocol, at the current point in time, using punycode is the most 
prudent thing to do. I don't have any problem putting this as an example 
into the new spec, but I don't want this current state of affairs to 
prohibit implementations to move to a cleaner state.

On the other hand, I also have to agree with Shawn. The various ways in 
which IRIs and URIs and their components can be used within an 
application are simply too many for us to prescribe "one true way" to 
handle this.

In my own implementation experience, when I added IDNA support to Amaya, 
I relied on it to convert IRIs internally to use %-encoding (without 
trying to analyze the IRI further), and then caught that %-encoding deep 
down in libwww (the network library on which Amaya relies) and converted 
it back to UTF-8 and then to punycode.

I expect that other applications may do similar things, or they may do 
completely different things, because they have a different structure. 
The various buggy behaviors that I got when testing %-encoding in domain 
names with Firefox and Safari seem to support Shawn's point that 
internally to the application, various different forms and conversions 
may exist.

Regards,    Martin.

On 2009/11/22 13:24, Erik van der Poel wrote:
> On Sat, Nov 21, 2009 at 11:02 AM, Shawn Steele
> <Shawn.Steele@microsoft.com>  wrote:
>> I'm still not sure that requiring punicode for URIs is helpful.
>> [...]
>> So saying "you MUST" do .... when converting an IRI to a URI doesn't
>> seem very helpful to me.  If IDN use doesn't currently do that already
>> I don't think people are going to change the system, risking
>> instability, to fix (or maybe break) a downgrade scenario for
>> compatibility in older software.
> One scenario where an IRI is converted to a URI that contains a host
> name is when a browser is using an HTTP proxy. (When there is no
> proxy, the browser sends a relative URI in the GET request and puts
> the host name in the Host header.)
> So I tried IE8 with an HTTP proxy, and it turns out that it converts
> the host name to Punycode. Do you think IE9 should send the host name
> in UTF-8 when using a proxy? What if the proxy is old, and doesn't
> know how to convert from UTF-8 to Punycode?
> Erik

#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Monday, 23 November 2009 11:06:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:40 UTC