W3C home > Mailing lists > Public > public-iri@w3.org > November 2009

RE: Using Punicode for host names in IRI -> URI translation

From: Shawn Steele <Shawn.Steele@microsoft.com>
Date: Mon, 23 Nov 2009 17:42:02 +0000
To: Erik van der Poel <erikv@google.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>
CC: Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>, Pete Resnick <presnick@qualcomm.com>, Ted Hardie <ted.ietf@gmail.com>
Message-ID: <E14011F8737B524BB564B05FF748464A04455C9B@TK5EX14MBXC139.redmond.corp.microsoft.com>
I wasn't quite sure how to respond about your HTTP proxy example as that seems a case where punycode is clearly useful, but I think Martin summed up my subconscious concern that it's "hard to prescribe 'one true way' to handle this."

I think a BCP has to apply to all users of the functionality, and, while it may be clear that this is "best " for proxy requests, I'm not at all convinced it is best in all cases.  I think mentioning it as a possible solution or example for some cases as Martin suggests is helpful.  I think a BCP is a bit too strong though.

- Shawn
________________________________________
From: Erik van der Poel [erikv@google.com]
Sent: Monday, November 23, 2009 8:47 AM
To: Martin J. Dürst
Cc: Shawn Steele; Larry Masinter; PUBLIC-IRI@W3.ORG; Pete Resnick; Ted Hardie
Subject: Re: Using Punicode for host names in IRI -> URI translation

Hello Martin,

I agree that there is no need for a standards track document to
prohibit moving to a cleaner state. But in the meantime, do you think
we should document the "most prudent thing to do"? Would that be in a
BCP? Should that BCP be updated when the most prudent thing to do
changes?

I also agree that there are many different things that we do with
IRIs, URIs and their components. I guess they cannot all be covered by
one document. Some of these things should probably be in an HTTP spec,
some should be in a mailto: spec, and so on. Maybe that's just stating
the obvious.

Erik

On Mon, Nov 23, 2009 at 3:05 AM, "Martin J. Dürst"
<duerst@it.aoyama.ac.jp> wrote:
> Hello Erik,
>
> I agree that for HTTP proxies, and for the Host: header in the HTTP
> protocol, at the current point in time, using punycode is the most prudent
> thing to do. I don't have any problem putting this as an example into the
> new spec, but I don't want this current state of affairs to prohibit
> implementations to move to a cleaner state.
>
> On the other hand, I also have to agree with Shawn. The various ways in
> which IRIs and URIs and their components can be used within an application
> are simply too many for us to prescribe "one true way" to handle this.
>
> In my own implementation experience, when I added IDNA support to Amaya, I
> relied on it to convert IRIs internally to use %-encoding (without trying to
> analyze the IRI further), and then caught that %-encoding deep down in
> libwww (the network library on which Amaya relies) and converted it back to
> UTF-8 and then to punycode.
>
> I expect that other applications may do similar things, or they may do
> completely different things, because they have a different structure. The
> various buggy behaviors that I got when testing %-encoding in domain names
> with Firefox and Safari seem to support Shawn's point that internally to the
> application, various different forms and conversions may exist.
>
> Regards,    Martin.
>
> On 2009/11/22 13:24, Erik van der Poel wrote:
>>
>> On Sat, Nov 21, 2009 at 11:02 AM, Shawn Steele
>> <Shawn.Steele@microsoft.com>  wrote:
>>>
>>> I'm still not sure that requiring punicode for URIs is helpful.
>>> [...]
>>> So saying "you MUST" do .... when converting an IRI to a URI doesn't
>>> seem very helpful to me.  If IDN use doesn't currently do that already
>>> I don't think people are going to change the system, risking
>>> instability, to fix (or maybe break) a downgrade scenario for
>>> compatibility in older software.
>>
>> One scenario where an IRI is converted to a URI that contains a host
>> name is when a browser is using an HTTP proxy. (When there is no
>> proxy, the browser sends a relative URI in the GET request and puts
>> the host name in the Host header.)
>>
>> So I tried IE8 with an HTTP proxy, and it turns out that it converts
>> the host name to Punycode. Do you think IE9 should send the host name
>> in UTF-8 when using a proxy? What if the proxy is old, and doesn't
>> know how to convert from UTF-8 to Punycode?
>>
>> Erik
>>
>
> --
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
>
Received on Monday, 23 November 2009 17:42:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:55 GMT