Re: IDNA and IRI document way forward

Hello Larry,

On 2009/07/29 16:07, Larry Masinter wrote:
> I confess that I'm just coming back up to speed on the
> issues, and hope you'll forgive me for missing some of
> the history,

No problem with that.

> It seems there are at least two communities (IDN/IDNA and
> IRI/WEB) which should have been working together for
> the past many years, haven't been, and we're now facing
> some difficulties in bringing their perspectives together,
> especially when those perspectives have been built
> into long-standing and finely argued documents.

I guess it's possible to see the situation this way, but I think it's 
also possible to see the glass as (at least) half full. Browser vendors 
have been implementing both IDNs and IRIs. Michel Suignard in particular 
was involved in both efforts. I also was and am on both lists, and tried 
to notice any relevant issues (which doesn't mean that they all made it 
into the current draft).

> I'm not entirely sure of the use case and difficulties,
> which I will try to track down in more detail.

Great. Very much looking forward to that.

> Just as personal speculation, however,
> I could easily imagine some problems if it were
> possible to register domain names which actually
> contained percent-hex-hex sequences.
>
> www.%77%33.org vs www.w3.org?

That was the direction that my speculation would go too.
It is certainly true that the DNS as such easily accepts any byte 
sequence in its labels. As far as I understand, that even includes null 
bytes, because the packets that the DNS sends use an initial byte to 
indicate the length of a label (taking two bits for other purposes, that 
results in the label length limit of 63 bytes).

However, "%77%33" never has been legal in URIs before RFC 3986 made it 
mean the same as "w3". Also, "%77%33" is definitely not something any 
DNS registrar would allow for registration, nor is it something that 
other IETF protocols would accept as a host name if they put any 
restrictions on it at all.

> Perhaps that would be a problem not just for IRIs
> but for other kinds of processing too.  Can this
> be disallowed at the URI parsing level? Only at
> the IRI level?

Well, "%77%33" as such would not need to be disallowed.
One would just have to write "%2577%2533" if one really had a need to 
express it (which I guess would be the case extremely rarely).

> I see the difficulties of creating a provision for
> scheme-specific parsing and restrictions on host names
> containing %xx hex-encoded bytes in URIs are even
> greater than what I imagined.

I very much have to agree that we have to weight the difficulties 
against the benefits.


>> That would be
>> http://validator.w3.org/check?uri=http://恵比寿駅.jp/

Or http://%E6%81%B5%E6%AF%94%E5%AF%BF%E9%A7%85.jp/ when escaped
(using UTF-8, as prescribed by RFC 3986/7).

> I'm sure there are difficulties even in circumstances that
> don't use "?", but this is especially difficult since the
> HTML-URL/HREF/WebAddress handling of non-ASCII query parameters
> adds some ambiguity to the translation of this into URI space.

Yes. But we should try to have these localized to those contexts,
if possible.

>> It's very clearly impossible to rule this out.
>
> Difficult, but not impossible.
>
>> But even before that, doing scheme-wise processing
>>   kills the U in URIs.
>
> And the I in Internationalized and several other things. Let's
> stick to identifying issues and alternatives.

I agree.

Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Wednesday, 29 July 2009 11:29:46 UTC