W3C home > Mailing lists > Public > public-iri@w3.org > November 2009

Re: Using Punicode for host names in IRI -> URI translation

From: Erik van der Poel <erikv@google.com>
Date: Mon, 23 Nov 2009 12:05:06 -0800
Message-ID: <c07a32650911231205y3d33aa84i2964a0dea932dfe8@mail.gmail.com>
To: Shawn Steele <Shawn.Steele@microsoft.com>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>, Pete Resnick <presnick@qualcomm.com>, Ted Hardie <ted.ietf@gmail.com>
I thought BCP was not quite as "strong" as Standards Track, and that
BCP might be the right kind of document for "the prudent thing to do"
(currently), since it stands for Best Current Practice. If we had a
BCP for IRI issues in HTTP, and another for email, and so on, we might
end up with too many small documents?

By the way, another scenario where URLs are converted from non-ASCII
to ASCII is a search engine. I had a look at Microsoft's bing.com and
it appears to send Punycode to end-users, probably because IE6 does
not support IDNA.

Erik

On Mon, Nov 23, 2009 at 9:42 AM, Shawn Steele
<Shawn.Steele@microsoft.com> wrote:
> I wasn't quite sure how to respond about your HTTP proxy example as that seems a case where punycode is clearly useful, but I think Martin summed up my subconscious concern that it's "hard to prescribe 'one true way' to handle this."
>
> I think a BCP has to apply to all users of the functionality, and, while it may be clear that this is "best " for proxy requests, I'm not at all convinced it is best in all cases.  I think mentioning it as a possible solution or example for some cases as Martin suggests is helpful.  I think a BCP is a bit too strong though.
>
> - Shawn
> ________________________________________
> From: Erik van der Poel [erikv@google.com]
> Sent: Monday, November 23, 2009 8:47 AM
> To: Martin J. Dürst
> Cc: Shawn Steele; Larry Masinter; PUBLIC-IRI@W3.ORG; Pete Resnick; Ted Hardie
> Subject: Re: Using Punicode for host names in IRI -> URI translation
>
> Hello Martin,
>
> I agree that there is no need for a standards track document to
> prohibit moving to a cleaner state. But in the meantime, do you think
> we should document the "most prudent thing to do"? Would that be in a
> BCP? Should that BCP be updated when the most prudent thing to do
> changes?
>
> I also agree that there are many different things that we do with
> IRIs, URIs and their components. I guess they cannot all be covered by
> one document. Some of these things should probably be in an HTTP spec,
> some should be in a mailto: spec, and so on. Maybe that's just stating
> the obvious.
>
> Erik
>
> On Mon, Nov 23, 2009 at 3:05 AM, "Martin J. Dürst"
> <duerst@it.aoyama.ac.jp> wrote:
>> Hello Erik,
>>
>> I agree that for HTTP proxies, and for the Host: header in the HTTP
>> protocol, at the current point in time, using punycode is the most prudent
>> thing to do. I don't have any problem putting this as an example into the
>> new spec, but I don't want this current state of affairs to prohibit
>> implementations to move to a cleaner state.
>>
>> On the other hand, I also have to agree with Shawn. The various ways in
>> which IRIs and URIs and their components can be used within an application
>> are simply too many for us to prescribe "one true way" to handle this.
>>
>> In my own implementation experience, when I added IDNA support to Amaya, I
>> relied on it to convert IRIs internally to use %-encoding (without trying to
>> analyze the IRI further), and then caught that %-encoding deep down in
>> libwww (the network library on which Amaya relies) and converted it back to
>> UTF-8 and then to punycode.
>>
>> I expect that other applications may do similar things, or they may do
>> completely different things, because they have a different structure. The
>> various buggy behaviors that I got when testing %-encoding in domain names
>> with Firefox and Safari seem to support Shawn's point that internally to the
>> application, various different forms and conversions may exist.
>>
>> Regards,    Martin.
>>
>> On 2009/11/22 13:24, Erik van der Poel wrote:
>>>
>>> On Sat, Nov 21, 2009 at 11:02 AM, Shawn Steele
>>> <Shawn.Steele@microsoft.com>  wrote:
>>>>
>>>> I'm still not sure that requiring punicode for URIs is helpful.
>>>> [...]
>>>> So saying "you MUST" do .... when converting an IRI to a URI doesn't
>>>> seem very helpful to me.  If IDN use doesn't currently do that already
>>>> I don't think people are going to change the system, risking
>>>> instability, to fix (or maybe break) a downgrade scenario for
>>>> compatibility in older software.
>>>
>>> One scenario where an IRI is converted to a URI that contains a host
>>> name is when a browser is using an HTTP proxy. (When there is no
>>> proxy, the browser sends a relative URI in the GET request and puts
>>> the host name in the Host header.)
>>>
>>> So I tried IE8 with an HTTP proxy, and it turns out that it converts
>>> the host name to Punycode. Do you think IE9 should send the host name
>>> in UTF-8 when using a proxy? What if the proxy is old, and doesn't
>>> know how to convert from UTF-8 to Punycode?
>>>
>>> Erik
>>>
>>
>> --
>> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
>> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
>>
Received on Monday, 23 November 2009 20:05:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:55 GMT