Re: Standardizing on IDNA 2003 in the URL Standard from Anne van Kesteren on 2013-08-22 (uri@w3.org from August 2013)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Thu, 22 Aug 2013 16:11:15 +0100
To: Gervase Markham <gerv@mozilla.org>
Cc: Mark Davis ☕ <mark@macchiato.com>, Shawn Steele <Shawn.Steele@microsoft.com>, IDNA update work <idna-update@alvestrand.no>, "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, "uri@w3.org" <uri@w3.org>, John C Klensin <klensin@jck.com>, Peter Saint-Andre <stpeter@stpeter.im>, Marcos Sanz <sanz@denic.de>, Vint Cerf <vint@google.com>, "www-tag.w3.org" <www-tag@w3.org>
Message-ID: <CADnb78jceuqQQVK9RQ+cY=FK8veqK+03=67hBxAz0PBmtGbHww@mail.gmail.com>

On Thu, Aug 22, 2013 at 1:59 PM, Gervase Markham <gerv@mozilla.org> wrote:
> Are you sure that "as deployed" is interoperable, or have different
> browsers done the "add new Unicode to IDNA2003" step differently?

Relatively certain, though I've not tested extensively. Unassigned
code points are allowed, so for that Unicode 3.2 does not matter. The
other case where Unicode 3.2 matters is normalization. Browsers just
use their internal NFKC algorithm for that, which is not bound to any
particular version of Unicode, it's whatever the latest version of
Unicode is they implement.

> Have you been arguing for 2 because you don't want 1? I'm not sure
> anyone's been arguing for 1. It's always been about 3.

I argued for 1 because I've previously gotten signals from Apple &
Google that they don't see much benefit in moving. It seems in the
case of Google this might have been incorrect. It's also still unclear
to me what the drawback of IDNA2003 is given existing practice. What
Vint Cerf keeps saying is true, IDNA2003 is bad because it relies on
Unicode 3.2, but I don't think IDNA2003 as written is what's under
discussion here which makes matters confusing. What matters is
IDNA2003 as implemented and deployed throughout the DNS.

On Thu, Aug 22, 2013 at 2:05 PM, Gervase Markham <gerv@mozilla.org> wrote:
> AIUI, assuming we write our replacement for the STD3ASCIIRules to
> disallow "/" in hostnames, we should be fine. When UseSTD3ASCIIRules is
> false, "℁" (U+2101) will map to "a/s", and then the "/" will be disallowed.

I think we should write the actual rules in the standard rather than
have each implementer come up with his own UseSTD3ASCIIRules
replacement. The standard should be fully deterministic. Exact
algorithms from a /domain name/ to a /ASCII domain name/ and a
/Unicode domain name/. As well as when either would return failure.
I.e. the rules we want the URL parser to use (not necessarily the
address bar I suppose, that can be "magic").

-- 
http://annevankesteren.nl/

Received on Thursday, 22 August 2013 15:11:51 UTC