[whatwg] Proposal for improved handling of '#' inside of data URIs from Mikko Rantalainen on 2011-09-13 (public-whatwg-archive@w3.org from September 2011)

From: Mikko Rantalainen <mikko.rantalainen@peda.net>
Date: Tue, 13 Sep 2011 10:41:08 +0300
Message-ID: <4E6F0914.3020707@peda.net>

2011-09-11 00:15 EEST: Daniel Holbert:
> Browsers handle the "#" character in data URIs very differently, and the
> arguably "correct" behavior is probably not what authors actually want
> in many cases.
> 
> This could be more intuitive/do-what-I-mean if we restricted the cases
> under which "#" is treated as a fragment-ID delimiter inside of data
> URIs.  In particular: when a "#" character is followed by ">" or "<" in
> a data URI, I propose that we *don't* treat the "#" as a delimiter, and
> instead just treat it as part of the encoded document.

Please, no. We already have WAY too much of "do what I mean" stuff in
HTML and it clearly does not work in the long run. The only sane
interpretation of literal "#" in URI is to use it always as a separator
for the fragment identifier. If some user agent does not follow this
logic, that user agent should be fixed.

> When an author writes a data URI for a document that contains a "#"
> character, she may unintentionally end up with broken results (or at
> least inconsistently-handled results), because the "#" may be treated as
> the end of the document & the beginning of the URI's fragment identifier.

When an author writes invalid markup the results should be invalid, too.
I agree with WHATWG/HTML5 that defining results for any binary input is
required but I think that the spec should not try to second guess the
author intention. If the input does not make sense, the output must not
make sense. We just need a spec that outputs the *same*
output-that-doesn't-make-sense for every user agent. After that the
author *will* notice her error and she *will* fix the markup.

-- 
Mikko

Received on Tuesday, 13 September 2011 00:41:08 UTC