W3C home > Mailing lists > Public > whatwg@whatwg.org > November 2012

[whatwg] URL: percent-encoded host

From: Anne van Kesteren <annevk@annevk.nl>
Date: Mon, 19 Nov 2012 22:12:48 +0100
Message-ID: <CADnb78g54Zryb9WYa=Kqrw1PgcVjM2mLQ9e3uS7_h6Y-D4MMwQ@mail.gmail.com>
To: WHATWG <whatwg@whatwg.org>
What to do with percent-encoded bytes in the host? Examples (prepend
http:// for a full URL):

1. x%2ex
2. %80
3. %41
4. %C3%A9

(Append % to each of them for confusing results, e.g. Opera handles
%41% and %C3%A9% differently.
http://dump.testsuite.org/url/inspect.html can be used for testing.)

There's a bunch of different approaches we could take (and most of
these seem to be done in one way or another):

1. Convert percent-encoded bytes to bytes, decode as utf-8.
2. Convert percent-encoded bytes to bytes, only decode those as utf-8
that represent a valid sequence.
3. Ignore percent-encoded bytes.

Chrome seems to do 1 (although with weird results when you hit a decoder error).
Opera and Safari seem to do 2.
Firefox seems to do 3.

Personally I'm leaning towards either 1 (without the weirdness) or 3.
The potential downside may be that with either of those you can no
longer transmit bytes higher than 0x7F over DNS or equivalent system.
Not sure if that's a problem in practice as user agents seem to mostly
fail already...

Received on Monday, 19 November 2012 22:18:55 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:17 UTC