- From: Anne van Kesteren <annevk@annevk.nl>
- Date: Thu, 12 Sep 2013 10:56:32 +0100
- To: Ian Hickson <ian@hixie.ch>
- Cc: whatwg <whatwg@lists.whatwg.org>, Boris Zbarsky <bzbarsky@mit.edu>, Adam Barth <w3c@adambarth.com>
On Wed, Sep 11, 2013 at 7:21 PM, Ian Hickson <ian@hixie.ch> wrote: > Surely the consistency of the API matching the input is more important > than the consistency of the API _not_ matching the input... The input will be mangled anyway. E.g. domain label separators are normalized to ".". And all kinds of other parts of the URL undergo normalization. >> It means the entire URL is effectively a byte sequence. > > I don't know what you mean here. No code point is higher than 7F. And given the way HTTP operates on URLs, and we extract data from data URLs, making it a byte sequence might not be a such a bad idea... >> And it's very clear what the DNS lookup will be. > > Why do you think people care more about that than about the URL matching > what they wrote in the markup? It won't match that anyway. >> And given that they keep insisting on changing what certain code points >> map to over in IETF-land (with limited support from browser vendors :/), >> it seems safer too. > > I don't understand what is safer. Surely if the punycoding step keeps > changing, it's less safe, since it'll mean that the results will change > without the author expecting it. If we don't punycode in the API, then the > result will be the same regardless of the punycode step. It depends on what you do with the result I suppose. https://groups.google.com/a/chromium.org/forum/?fromgroups=#!topic/blink-dev/fBsVRcEOTWM seems relevant. http://url.spec.whatwg.org/ defines ASCII at the moment. The other reason that is I just remembered is because ToASCII can fail and at that point we want to return failure for the URL. I suppose we could run ToASCII and then ToUnicode... -- http://annevankesteren.nl/
Received on Thursday, 12 September 2013 09:57:00 UTC