[whatwg] Proposal for improved handling of '#' inside of data URIs

On Sat, 10 Sep 2011 17:15:09 -0400, Daniel Holbert <dholbert at mozilla.com>  
wrote:

> Browsers handle the "#" character in data URIs very differently, and the  
> arguably "correct" behavior is probably not what authors actually want  
> in many cases.
>
> This could be more intuitive/do-what-I-mean if we restricted the cases  
> under which "#" is treated as a fragment-ID delimiter inside of data  
> URIs.   In particular: when a "#" character is followed by ">" or "<" in  
> a data URI, I propose that we *don't* treat the "#" as a delimiter, and  
> instead just treat it as part of the encoded document.

Not only must "#" be "%23" if you don't want it as a frag id, but ">" and  
"<" should be "%3E" and "%3C".

Encoding the data (markup for example) for the data URI is simple. Just  
use encodeURIComponent(markup) (on a UTF-8 page) in JS on the data. You  
still hand-author the markup. You just paste the markup into a textarea  
and have something (like encodeURIComponent()) percent-encode it for you.

Of course, if you can percent-encode everything needed as you type, you  
can hand-author the URI data. But, who wants to do that, except for simple  
data? It's like hand-authoring mime messages. It's not something you would  
normally do to create an email or mht file.

If you need to encode the data URI data as base64 instead, you can do  
encodeURIComponent(btoa(unescape(encodeURIComponent(markup)))); (on a  
utf-8 page).

And, there's already <http://software.hixie.ch/utilities/cgi/data/data>  
too.

Given that, I personally don't think browsers should be too lax with  
authors that don't properly-encode their data. Javascript URI  
(bookmarklet) authors already get away with that (even though there's  
pages like <http://shadow2531.com/js/jsuri.html>), but at the same time  
often run into unexpected (to them) percent-decoding of the URI data  
before it's executed.

-- 
Michael

Received on Sunday, 11 September 2011 07:21:48 UTC