Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-25 (public-whatwg-archive@w3.org from September 2012)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Tue, 25 Sep 2012 17:03:06 +0200
To: Ian Hickson <ian@hixie.ch>
Cc: David Sheets <kosmo.zb@gmail.com>, whatwg <whatwg@whatwg.org>
Message-ID: <CADnb78jCxBQeZ8uE2YFymh8Non12awmoR8pWJpzw-jJB=uT5wg@mail.gmail.com>

On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson <ian@hixie.ch> wrote:
> Not necessarily, but that's certainly possible. Personally I would
> recommend that we not change the definition of what is conforming from the
> current RFC3986/RFC3987 rules, except to the extent that the character
> encoding affects it (as per the HTML standard today).
>
>    http://whatwg.org/html#valid-url

FWIW, given that browsers happily do requests to servers with
characters in the URL that are "invalid" per the RFC (they are not URL
escaped) and servers handle them fine I think we should make the
syntax more lenient. E.g. allowing [ and ] in the path and query
component is fine I think.

As for the question about why not build this on top of RFC 3986. That
does not handle non-ASCII code points. RFC 3987 does, but is not a
suitable start either. As shown in http://url.spec.whatwg.org/ it is
quite trivial to combine parsing, resolving, and canonicalizing into a
single algorithm (and deal with URI/IRI, now URL, as one). Trying to
somehow patch the language in RFC 3987 to deal with the encoding
problems for the query component, to deal with parsing
http:example.org when there is a base URL with the same scheme versus
when there isn't, etc. is way more of a hassle I think, though I am
happy to be proven wrong.

-- 
http://annevankesteren.nl/

Received on Tuesday, 25 September 2012 15:04:07 UTC