Re: Unicode IDNA feedback from Anne van Kesteren on 2017-02-13 (www-archive@w3.org from February 2017)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Mon, 13 Feb 2017 10:33:47 +0100
To: Mark Davis ☕️ <mark@macchiato.com>
Cc: Sebastian Mayr <github@smayr.name>, www-archive <www-archive@w3.org>, Markus Scherer <mscherer@google.com>, Jungshik Shin (신정식, 申政湜) <jungshik@google.com>, Michel SUIGNARD <Michel@suignard.com>, Alex Christensen <achristensen@apple.com>, Domenic Denicola <d@domenic.me>, Simon Montagu <smontagu@smontagu.org>
Message-ID: <CADnb78gK5PC8M5WUjFMFg4qhgxLmqR2kPA-3CfGXzTFcrLco9w@mail.gmail.com>

On Sat, Jan 7, 2017 at 12:04 PM, Anne van Kesteren <annevk@annevk.nl> wrote:
> On Sat, Jan 7, 2017 at 11:32 AM, Mark Davis ☕️ <mark@macchiato.com> wrote:
>> So it isn't missing anything that you know of?
>
> While working on ASCII-only tests for the host parser Domenic and I
> discovered the issue with "-" yesterday that he reported in the
> document.

I left a comment in the document as well, but since there's a couple
people copied here that might not see it I thought I'd raise it here
as well.

I think one major issue with ToASCII and browsers is that browsers do
not appear to invoke it for ASCII-only input. So browsers do appear to
validate a leading hyphen (at least sometimes), do appear to validate
the maximum 63 code points for a label, but only if the input is
non-ASCII.

See https://annevankesteren.nl/2017/02/idna-toascii-differences for
the six relatively simple tests I used to demonstrate this.

I don't really know what a good solution to this problem is or to what
extent browsers are willing to make changes to apply the same kind of
validation to all domain input.

One obvious solution is that the URL Standard only invokes ToASCII if
the input contains non-ASCII, but that feels a little wrong.

-- 
https://annevankesteren.nl/

Received on Monday, 13 February 2017 09:34:17 UTC