- From: Anne van Kesteren <annevk@annevk.nl>
- Date: Thu, 16 Jan 2014 14:27:03 +0000
- To: Mark Davis ☕ <mark@macchiato.com>
- Cc: Gervase Markham <gerv@mozilla.org>, John C Klensin <klensin@jck.com>, yaojk <yaojk@cnnic.cn>, Paul Hoffman <paul.hoffman@vpnc.org>, "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, "uri@w3.org" <uri@w3.org>, IDNA update work <idna-update@alvestrand.no>, "www-tag.w3.org" <www-tag@w3.org>
On Thu, Jan 16, 2014 at 1:24 PM, Mark Davis ☕ <mark@macchiato.com> wrote: > It is not unlikely that an implementation that you think is following > IDNA2003 (with a non-standard, larger repertoire) is actually following UTS > 46. I know for a fact that Gecko has not changed its implementation (but has updated Unicode since the release of IDNA2003, doh). It "passes" the Pile of Poo Test™: <a href="http://💩.com/">test</a> <script>alert(document.querySelector("a").host)</script> Alerts: xn--ls8h.com Chrome alerts the same and reportedly has updated to UTS46 (compatible mode), so as you point out the differences are probably minor and require checking of some obscurer code points. > There is a table in > http://unicode.org/reports/tr46/#Table_IDNA_Comparisons That is an interesting table. Ⅎ (line c) seems indeed disallowed in Chrome, yet 㛼 (line d) which should also be disallowed per that table works fine. Both work fine in Firefox. Both Chrome and Firefox map ! (line b) to ! and do not cause parsing to fail because of it, even though the table suggests it should. (Presumably do it making assumptions about ASCII that browsers do not share.) Firefox and Safari map (line i) and Chrome does not. > One way to look at UTS 46 is as a migration layer to support client > implementations during the transition of registries from IDNA2003 to > IDNA2008, plus a mapping layer that can be used with straight IDNA2008. I'm not sure what this means. Do you think we will ever stop mapping U+3002 to U+002E? Or A to a? >> I think I did mention earlier on UTS46 might be okay, depending on the > details. I am hoping to hear from Mark on the matter. > > I'm not sure what specific questions you have about UTS 46. Can you > reiterate them? You keep talking about UTS 46 as if it were a migration layer, which suggests it might go away. That does not really seem acceptable to me. It enforces DNS length restrictions on domain names (IDNA2003 did the same), which does not appear to be implemented in browsers. They're fine with a label longer than a hundred code points. I don't think this should be outlawed at the parsing layer because the name might be used outside the DNS. I wish it contained the actual ASCII restrictions we need in practice rather than deferring those to the application, but I suppose I can define those in the URL Standard and use UseSTD3ASCIIRules=false. Another wish I have is that the algorithms are a bit clearer in terms of input and output. What argument does ToASCII take? What about ToUnicode? E.g. how would you replace "domain to ASCII" and "domain to Unicode" in http://url.spec.whatwg.org/#concept-host-parser with UTS46 and ensure the algorithm still has the same kind of expected output? It's not entirely clear to me how to make use of your work. -- http://annevankesteren.nl/
Received on Thursday, 16 January 2014 14:27:31 UTC