W3C home > Mailing lists > Public > public-iri@w3.org > November 2009

Re: Using Punicode for host names in IRI -> URI translation

From: Erik van der Poel <erikv@google.com>
Date: Mon, 23 Nov 2009 08:47:52 -0800
Message-ID: <c07a32650911230847v42ade471habb3459f095d9797@mail.gmail.com>
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Cc: Shawn Steele <Shawn.Steele@microsoft.com>, Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>, Pete Resnick <presnick@qualcomm.com>, Ted Hardie <ted.ietf@gmail.com>
Hello Martin,

I agree that there is no need for a standards track document to
prohibit moving to a cleaner state. But in the meantime, do you think
we should document the "most prudent thing to do"? Would that be in a
BCP? Should that BCP be updated when the most prudent thing to do
changes?

I also agree that there are many different things that we do with
IRIs, URIs and their components. I guess they cannot all be covered by
one document. Some of these things should probably be in an HTTP spec,
some should be in a mailto: spec, and so on. Maybe that's just stating
the obvious.

Erik

On Mon, Nov 23, 2009 at 3:05 AM, "Martin J. Dürst"
<duerst@it.aoyama.ac.jp> wrote:
> Hello Erik,
>
> I agree that for HTTP proxies, and for the Host: header in the HTTP
> protocol, at the current point in time, using punycode is the most prudent
> thing to do. I don't have any problem putting this as an example into the
> new spec, but I don't want this current state of affairs to prohibit
> implementations to move to a cleaner state.
>
> On the other hand, I also have to agree with Shawn. The various ways in
> which IRIs and URIs and their components can be used within an application
> are simply too many for us to prescribe "one true way" to handle this.
>
> In my own implementation experience, when I added IDNA support to Amaya, I
> relied on it to convert IRIs internally to use %-encoding (without trying to
> analyze the IRI further), and then caught that %-encoding deep down in
> libwww (the network library on which Amaya relies) and converted it back to
> UTF-8 and then to punycode.
>
> I expect that other applications may do similar things, or they may do
> completely different things, because they have a different structure. The
> various buggy behaviors that I got when testing %-encoding in domain names
> with Firefox and Safari seem to support Shawn's point that internally to the
> application, various different forms and conversions may exist.
>
> Regards,    Martin.
>
> On 2009/11/22 13:24, Erik van der Poel wrote:
>>
>> On Sat, Nov 21, 2009 at 11:02 AM, Shawn Steele
>> <Shawn.Steele@microsoft.com>  wrote:
>>>
>>> I'm still not sure that requiring punicode for URIs is helpful.
>>> [...]
>>> So saying "you MUST" do .... when converting an IRI to a URI doesn't
>>> seem very helpful to me.  If IDN use doesn't currently do that already
>>> I don't think people are going to change the system, risking
>>> instability, to fix (or maybe break) a downgrade scenario for
>>> compatibility in older software.
>>
>> One scenario where an IRI is converted to a URI that contains a host
>> name is when a browser is using an HTTP proxy. (When there is no
>> proxy, the browser sends a relative URI in the GET request and puts
>> the host name in the Host header.)
>>
>> So I tried IE8 with an HTTP proxy, and it turns out that it converts
>> the host name to Punycode. Do you think IE9 should send the host name
>> in UTF-8 when using a proxy? What if the proxy is old, and doesn't
>> know how to convert from UTF-8 to Punycode?
>>
>> Erik
>>
>
> --
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
>
Received on Monday, 23 November 2009 16:48:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:55 GMT