W3C home > Mailing lists > Public > whatwg@whatwg.org > September 2012

Re: [whatwg] New URL Standard

From: David Sheets <kosmo.zb@gmail.com>
Date: Tue, 25 Sep 2012 11:20:15 -0700
Message-ID: <CAAWM5TwOeyQ1xQiqFRCEfgKTpk6im1Md8PD6kGpqZF2=nNq_jg@mail.gmail.com>
To: Anne van Kesteren <annevk@annevk.nl>
Cc: whatwg <whatwg@whatwg.org>, Ian Hickson <ian@hixie.ch>
On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren <annevk@annevk.nl> wrote:
> On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson <ian@hixie.ch> wrote:
>> Not necessarily, but that's certainly possible. Personally I would
>> recommend that we not change the definition of what is conforming from the
>> current RFC3986/RFC3987 rules, except to the extent that the character
>> encoding affects it (as per the HTML standard today).
>>
>>    http://whatwg.org/html#valid-url
>
> FWIW, given that browsers happily do requests to servers with
> characters in the URL that are "invalid" per the RFC (they are not URL
> escaped) and servers handle them fine I think we should make the
> syntax more lenient. E.g. allowing [ and ] in the path and query
> component is fine I think.

I believe this would introduce ambiguity for parsing URI references.
Is "[::1]" an authority reference or a path segment reference?

> As for the question about why not build this on top of RFC 3986. That
> does not handle non-ASCII code points. RFC 3987 does, but is not a
> suitable start either. As shown in http://url.spec.whatwg.org/ it is
> quite trivial to combine parsing, resolving, and canonicalizing into a
> single algorithm (and deal with URI/IRI, now URL, as one).

Composition is often trivial but unenlightening. There is necessarily
less information in a partially evaluated function composition than in
the functions in isolation.

Defining a formal language accurately and in a broadly understandable
manner is nontrivial. Your task is nontrivial.

> Trying to
> somehow patch the language in RFC 3987 to deal with the encoding
> problems for the query component, to deal with parsing
> http:example.org when there is a base URL with the same scheme versus
> when there isn't, etc. is way more of a hassle I think, though I am
> happy to be proven wrong.

I believe the encoding problems are handled by a normalization
algorithm and parsing relative references is handled by the base
scheme module.

What is the acceptable trade-off between (y)our hassle and the time of
technologists in the coming decades? Will you make it easier or harder
for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)?

> --
> http://annevankesteren.nl/
Received on Tuesday, 25 September 2012 18:20:47 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:45 UTC