W3C home > Mailing lists > Public > uri@w3.org > January 2014

Re: Standardizing on IDNA 2003 in the URL Standard

From: Mark Davis ☕ <mark@macchiato.com>
Date: Thu, 16 Jan 2014 14:24:07 +0100
Message-ID: <CAJ2xs_HW7zu98yv7S2-9kxnHofD7Y8-DgS+2NqCk1Z5+crp-xA@mail.gmail.com>
To: Anne van Kesteren <annevk@annevk.nl>
Cc: Gervase Markham <gerv@mozilla.org>, John C Klensin <klensin@jck.com>, yaojk <yaojk@cnnic.cn>, Paul Hoffman <paul.hoffman@vpnc.org>, "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, "uri@w3.org" <uri@w3.org>, IDNA update work <idna-update@alvestrand.no>, "www-tag.w3.org" <www-tag@w3.org>
> The point is that in practice, it [IDNA2003] isn't fixed to Unicode 3.2.

It is not unlikely that an implementation that you think is following
IDNA2003 (with a non-standard, larger repertoire) is actually following UTS
46.

If you were reverse-engineering to find out which standard an
implementation was following, you'd need to query certain characters to see
if they were supported, and how. UTS 46 also allows two 'modes', for
transitional and not, that you'd have to test. There is a table in
http://unicode.org/reports/tr46/#Table_IDNA_Comparisons that illustrates
this. (You'd have to look at the data tables to get a full listing.) And,
of course, it is clearly possible for an implementation to be
non-conformant to all of the standards we are talking about (IDNA2003, UTS
46, and IDNA2008).

As previously noted, however, casing differences and the 4 deviation
characters take some careful checking, since there is a difference between
what the implementation accepts and what goes out 'over the wire'. And the
implementation may also not be using the latest version of Unicode, which
would make a difference for UTS 46 and IDNA2008.

BTW, there's an online demo of Unicode properties that can be used to see
differences. The categories are slightly different than what is shown in
the above chart, but you can get a sense for the differences:

http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{any}&abb=on&g=idna2003+uts46+idna2008

One way to look at UTS 46 is as a migration layer to support client
implementations during the transition of registries from IDNA2003 to
IDNA2008, plus a mapping layer that can be used with straight IDNA2008.

> I think I did mention earlier on UTS46 might be okay, depending on the
details. I am hoping to hear from Mark on the matter.

​I'm not sure what specific​ questions you have about UTS 46. Can you
reiterate them?




Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*


On Thu, Jan 16, 2014 at 12:48 PM, Anne van Kesteren <annevk@annevk.nl>wrote:

> On Thu, Jan 16, 2014 at 11:36 AM, Gervase Markham <gerv@mozilla.org>
> wrote:
> > On 16/01/14 11:17, Anne van Kesteren wrote:
> >> It's not worse if it's fully backwards compatible and mostly
> >> interoperable across all major clients. At that point the standard is
> >> just wrong.
> >
> > And having a standard fixed to Unicode 3.2 is not also "just wrong"?
>
> The point is that in practice, it isn't fixed to Unicode 3.2. I have
> yet to encounter an IDNA2003 implementation that does that. It turns
> out the setup we have in practice is a compatible evolution.
>
>
> > And I refer you to my comments above. Problems like lowercasing (for
> > better or worse) are punted by IDNA2008 and are labelled as an
> > application-level problem. In practice, what everyone should do for best
> > interoperability is implement the same application-level mappings, and
> > implement ones which are as compatible as possible with IDNA2003.
> > Hence.... UTS46.
>
> I think I did mention earlier on UTS46 might be okay, depending on the
> details. I am hoping to hear from Mark on the matter.
>
>
> --
> http://annevankesteren.nl/
>
>
Received on Thursday, 16 January 2014 13:24:38 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:16 UTC