[whatwg] Proposal for improved handling of '#' inside of data URIs from Glenn Maynard on 2011-09-11 (public-whatwg-archive@w3.org from September 2011)

From: Glenn Maynard <glenn@zewt.org>
Date: Sun, 11 Sep 2011 12:14:08 -0400
Message-ID: <CABirCh9cWN5V8qVTZy1m7u-kjbTT33di6q0GJcQWP4oArqs3zw@mail.gmail.com>

On Sat, Sep 10, 2011 at 5:15 PM, Daniel Holbert <dholbert at mozilla.com>wrote:

> This could be more intuitive/do-what-I-mean if we restricted the cases
> under which "#" is treated as a fragment-ID delimiter inside of data URIs.
>  In particular: when a "#" character is followed by ">" or "<" in a data
> URI, I propose that we *don't* treat the "#" as a delimiter, and instead
> just treat it as part of the encoded document.
>

An HTML document in a data: URI containing a # is probably followed by a >
or <; but that's an "if", not "iff".  It doesn't imply that a # followed by
a > or < is *always* intended as part of the data and not an actual
fragment.

data:text/html,foo<div style=height:3000px></div><span
id='vector<int>'>bar</span>#vector<int>

I don't think adding black magic to URI parsing will make things less
confusing.

Firefox parses fragment-identifiers strictly, potentially giving authors
> headaches and truncating content that renders fine in Opera/Webkit.
>

I'd say the opposite: WebKit breaks this author's expectations and
encourages headaches, by not parsing the above URIs in the ordinary way,
where Firefox matches my expectations.  I was certainly surprised to find
that Chrome fails the above.

On Sun, Sep 11, 2011 at 10:21 AM, Michael A. Puls II
<shadow2531 at gmail.com>wrote:

> Not only must "#" be "%23" if you don't want it as a frag id, but ">" and
> "<" should be "%3E" and "%3C".
>

I'm not sure about the spec on this, but Firefox actively unencodes %3E and
%3C.  Pasting this into the address bar and copying it back out turns them
back into literal < and > characters:

data:text/html,foo<div style=height:3000px></div><span
id='vector<int>'>bar</span>#vector%3Cint%3E

which suggests that escaping these characters isn't necessary or encouraged.

-- 
Glenn Maynard

Received on Sunday, 11 September 2011 09:14:08 UTC