W3C home > Mailing lists > Public > uri@w3.org > January 2014

Re: Standardizing on IDNA 2003 in the URL Standard

From: Andrew Sullivan <ajs@anvilwalrusden.com>
Date: Thu, 16 Jan 2014 11:11:08 -0500
To: "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, "uri@w3.org" <uri@w3.org>
Cc: IDNA update work <idna-update@alvestrand.no>, "www-tag.w3.org" <www-tag@w3.org>
Message-ID: <20140116161108.GI22344@crankycanuck.ca>
Apologies for any duplicates; I originally sent this from an
unsubscribed address.

Hi,

On Thu, Jan 16, 2014 at 11:48:45AM +0000, Anne van Kesteren wrote:
> The point is that in practice, it isn't fixed to Unicode 3.2. I have
> yet to encounter an IDNA2003 implementation that does that. It turns
> out the setup we have in practice is a compatible evolution.

Maybe.  First, see Mark Davis's remarks.  Second, please tell me what
these implementations are supposed to do with, say, the 2,088
characters that were added in Unicode 6.0, among which are the emoji
symbols and the Rupee Sign.  Do they all do the same thing?  How do
you know?  Why do you think they always will?  The behaviour is
undefined under IDNA2003.  As you noted in this thread, the point of
having standards is exactly that you have an answer to these things so
that everyone can interoperate without asking everyone else who has
ever implemented the same functionality, "Pssst! What did you do with
U+1F301?"

The reason to go to IDNA2008 is that it is supposed to provide an
answer to this sort of question in a completely general way.  Despite
the fact that IDNA2008 came out before Unicode 6.0, it has an answer
to the question, "What do I do with the emoji?"  And sure enough, the
(non-normative) derived properties database that IANA helpfully
provides lists exactly what you would expect those characters to do.
But your implementation wouldn't need to wait for the registry to be
updated to know.

I cannot take seriously the argument that this is all about
compatibility if that argument depends on using a standard that simply
leaves out thousands of characters, and under which applications have
to make up their own handling rules for those.  That is no promise of
compatibility at all.

I believe stable of URIs are really important, and I think backward
compatibility with deployed code is just as important.  But if there
is any opportunity to fix this properly, it is now.  If we don't
embrace that, the problem will be much worse in the future --
especially as more IDNs show up at the top level of the DNS.  In my
opinion, there is a responsibility to embrace IDNA2008 now, because it
is the best approach we were able to come up with given the conflicts
between internationalization and localization.

Best regards,

Andrew


-- 
Andrew Sullivan
ajs@anvilwalrusden.com
Received on Thursday, 16 January 2014 16:11:42 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:16 UTC