Re: [whatwg] Question about document.referrer (and document.URL, document.location.href) when IDN domains are in use from Anne van Kesteren on 2013-09-12 (public-whatwg-archive@w3.org from September 2013)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Thu, 12 Sep 2013 10:56:32 +0100
To: Ian Hickson <ian@hixie.ch>
Cc: whatwg <whatwg@lists.whatwg.org>, Boris Zbarsky <bzbarsky@mit.edu>, Adam Barth <w3c@adambarth.com>
Message-ID: <CADnb78iQ6TnDD+p=WPs8=xFDKYG=QiiBSaxOVj9gqrWYAxzbAA@mail.gmail.com>

On Wed, Sep 11, 2013 at 7:21 PM, Ian Hickson <ian@hixie.ch> wrote:
> Surely the consistency of the API matching the input is more important
> than the consistency of the API _not_ matching the input...

The input will be mangled anyway. E.g. domain label separators are
normalized to ".". And all kinds of other parts of the URL undergo
normalization.

>> It means the entire URL is effectively a byte sequence.
>
> I don't know what you mean here.

No code point is higher than 7F. And given the way HTTP operates on
URLs, and we extract data from data URLs, making it a byte sequence
might not be a such a bad idea...

>> And it's very clear what the DNS lookup will be.
>
> Why do you think people care more about that than about the URL matching
> what they wrote in the markup?

It won't match that anyway.

>> And given that they keep insisting on changing what certain code points
>> map to over in IETF-land (with limited support from browser vendors :/),
>> it seems safer too.
>
> I don't understand what is safer. Surely if the punycoding step keeps
> changing, it's less safe, since it'll mean that the results will change
> without the author expecting it. If we don't punycode in the API, then the
> result will be the same regardless of the punycode step.

It depends on what you do with the result I suppose.

https://groups.google.com/a/chromium.org/forum/?fromgroups=#!topic/blink-dev/fBsVRcEOTWM
seems relevant.

http://url.spec.whatwg.org/ defines ASCII at the moment. The other
reason that is I just remembered is because ToASCII can fail and at
that point we want to return failure for the URL. I suppose we could
run ToASCII and then ToUnicode...

-- 
http://annevankesteren.nl/

Received on Thursday, 12 September 2013 09:57:00 UTC