W3C home > Mailing lists > Public > whatwg@whatwg.org > December 2008

[whatwg] URL parsing and same-document references [was: Re: Citing multiple <blockquote> elements in HTML5]

From: Nils Dagsson Moskopp <nils-dagsson-moskopp@dieweltistgarnichtso.net>
Date: Sat, 13 Dec 2008 04:07:18 +0100
Message-ID: <1229137638.4894.15.camel@desudesudesu>
Am Freitag, den 12.12.2008, 20:36 +0100 schrieb Calogero Alex
> The above (but the 'double check' I was suggesting) is about the way 
> Firefox (2.x and 3.0.4) behaves (both href="#foo%20bar" and, in a 
> different page, href="./example.html#foo%20bar" match id="foo bar"), 
> while IE7 and Opera 9.x perform an exact comparison, and show, in the 
> address bar, an url with eventual blank spaces, thus applying the 
> relaxation allowed by URL parsing rules, but not conforming to RFC 3986, 
> as a complete URI string.
Whenever I copypaste an URI from the address bar to any other program, I
am severely annoyed by this, especially when spaces (delimiters !) are
part of the fake-URI. A chat or office program, for example, is unable
to highlight the fake-URI anymore, (how could it ?), also pasting it
into source code can create all kind of validation errors. And whenever
I get a bastardized URI via chat or mail, only a part of it is

Can someone from the web browser faction please state if there is any
data to support breaking RFC-compatibility ? Because as I see it, its
something that makes it appear nicer, but breaks whenever URIs are to be
transferred / communicated.

Getting to the problem mentioned here, the robustness principle says
that id="foo bar" should be accepted, but nevertheless invalid - because
a fragment with a space can never be part of an URI. So IMHO, any
program should strive to accept broken URIs if they are unambigous
(which they are here, because the address can hold only one URI at a
time), but never output them.

Nils Dagsson Moskopp
Received on Friday, 12 December 2008 19:07:18 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:46 UTC