W3C home > Mailing lists > Public > uri@w3.org > July 2009

RE: IDNA and IRI document way forward

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 29 Jul 2009 00:07:27 -0700
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
CC: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>, URI <uri@w3.org>, "John Klensin (klensin@jck.com)" <klensin@jck.com>, Vint Cerf <vint@google.com>
Message-ID: <8B62A039C620904E92F1233570534C9B0118D818046A@nambx04.corp.adobe.com>
I confess that I'm just coming back up to speed on the
issues, and hope you'll forgive me for missing some of
the history, 

It seems there are at least two communities (IDN/IDNA and
IRI/WEB) which should have been working together for
the past many years, haven't been, and we're now facing
some difficulties in bringing their perspectives together,
especially when those perspectives have been built
into long-standing and finely argued documents.

I'm not entirely sure of the use case and difficulties,
which I will try to track down in more detail.

Just as personal speculation, however,
I could easily imagine some problems if it were
possible to register domain names which actually
contained percent-hex-hex sequences.

www.%77%33.org vs www.w3.org?

Perhaps that would be a problem not just for IRIs
but for other kinds of processing too.  Can this
be disallowed at the URI parsing level? Only at
the IRI level?

I see the difficulties of creating a provision for
scheme-specific parsing and restrictions on host names
containing %xx hex-encoded bytes in URIs are even
greater than what I imagined.


> That would be
> http://validator.w3.org/check?uri=http://恵比寿駅.jp/


I'm sure there are difficulties even in circumstances that
don't use "?", but this is especially difficult since the 
HTML-URL/HREF/WebAddress handling of non-ASCII query parameters 
adds some ambiguity to the translation of this into URI space.

> It's very clearly impossible to rule this out.

Difficult, but not impossible.

> But even before that, doing scheme-wise processing
>  kills the U in URIs.

And the I in Internationalized and several other things. Let's
stick to identifying issues and alternatives.

> I think this is unfortunate and a pretty drastic change 
> to the IRI document, but I don't think we're going to make 
> progress if we don't take the bull by the horns.

> Before taking anything by the horns (or the tail, or whatever) I'd like 
> to know in great details what exactly the actual (or pretended) bull is.

And if there there are two bulls there would be
four horns, a Tetralemma, and quite a bit of BS.

Larry
-- 
http://larry.masinter.net


Received on Wednesday, 29 July 2009 07:09:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:42 GMT