- From: Daniel Holbert <dholbert@mozilla.com>
- Date: Sat, 10 Sep 2011 14:15:09 -0700
Hi whatwg, I'm writing with a proposal to improve the handling of "#" in data URIs. I'm particularly looking for feedback from other browser vendors, but of course feedback from others is welcome as well. SUMMARY: ======== Browsers handle the "#" character in data URIs very differently, and the arguably "correct" behavior is probably not what authors actually want in many cases. This could be more intuitive/do-what-I-mean if we restricted the cases under which "#" is treated as a fragment-ID delimiter inside of data URIs. In particular: when a "#" character is followed by ">" or "<" in a data URI, I propose that we *don't* treat the "#" as a delimiter, and instead just treat it as part of the encoded document. Now, a set of tests, to which I'll refer below: http://people.mozilla.org/~dholbert/dataURIHashTests/tests_v1.xhtml PROBLEM: ======== When an author writes a data URI for a document that contains a "#" character, she may unintentionally end up with broken results (or at least inconsistently-handled results), because the "#" may be treated as the end of the document & the beginning of the URI's fragment identifier. (I believe this to be the _technically_ correct (albeit unintuitive) behavior per the URI RFC [1] -- it's the behavior we've implemented in Firefox 6 [2] and it's what I've described as "Correct" in my testcase. (with quotes to indicate unintuitiveness)) Technically, the author *really* should encode the "#" character as "%23", if she doesn't want it to be a delimiter. However, this gotcha is easy to overlook -- especially because Opera & Webkit are less strict than Firefox in this respect and will gladly accept "#" inside data URIs under some circumstances. THE PROPOSAL & HOW IT HELPS: ============================ We can help out the author by relaxing our fragment-ID-parsing rules a bit here. Note that in cases where an author *accidentally* includes "#" inside their data URI (e.g. <body background="#f00">), there almost certainly will be more content following it -- in particular, there will be an </html>, or an </svg>, or at least a ">" (if it's inside the final tag) still to come. So we can proactively check for >/< characters anywhere after the "#", and if we find them, then we can pretty safely assume that the author intended for the "#" to be part of the document, rather than a fragment-ID delimiter. OVERVIEW OF BROWSERS' CURRENT HANDLING OF "#" IN DATA URIs: =========================================================== url: http://people.mozilla.org/~dholbert/dataURIHashTests/tests_v1.xhtml * Firefox 6+ breaks the author's expectations in my tests A & B due to URI parsing strictness. (But if we were to implement the above proposal, we'd match the author's expectations.) We pass test C due to correctly trimming "#target" off of the end and scrolling to the referenced element. And we fail test D only due to a bug with over-enforcing same-origin checks.[3] * WebKit matches the author's expectations on A & B -- however, that's only because they don't seem to support "#ref" suffixes on the ends of data URIs at all, so they _always_ include "#" in the document. (They *do* apparently support _relative_ references within data URI documents, e.g. xlink:href='#greenRect' as used in test B.) So, Webkit ends up failing test C because they don't strip off the "#target" suffix (resulting in broken XML). They fail test D presumably for the same reason. (They also have some zooming issues on the <img> examples, but I'm ignoring those for the purposes of this post.) * Opera is interesting -- it can exhibit either the Firefox or WebKit behaviors in tests A/B/C, depending on whether the data URI as an embedded element (via iframe/img) or view it directly. When you view it as an embedded element (in my testcase), Opera matches WebKit on A/B/C (including the XML parse error on C). However, if you *directly view* the data URIs (right-click on iframe, Frame|Open, focus URLbar & hit enter), then Opera matches Firefox. Also, Opera passes test D. (I don't have results for IE -- I briefly tried to support it in the test, but I had issues getting data URIs to work there at all.) CONCLUSION: =========== So - to sum up the test-results above: webkit doesn't give "#" any special delimiter status in data URIs, which is a bug, but probably matches what authors intend a lot of the time; Opera sometimes behaves like Webkit and sometimes not; and Firefox parses fragment-identifiers strictly, potentially giving authors headaches and truncating content that renders fine in Opera/Webkit. With my proposal here -- relaxing the situations under which "#" should be treated as a delimiter in a data URI -- I think we'd better match author expectations and improve the browser-compatibility picture. Thoughts? Thanks, Daniel Holbert Mozilla Corporation P.S. Thanks to Robert O'Callahan for coming up with this proposal a week or so back. P.P.S. Browser versions that I tested (on Ubuntu 11.04 x86): Firefox 6.02 Opera 11.51 Chromium 14.0.835.126 (Developer Build 99097 Linux) [1] https://www.ietf.org/rfc/rfc2396.txt See section 4.1 & appendix "B" ("Parsing a URI Reference with a Regular Expression") which shows that "#" is technically disallowed up until the #reference at the end.) [2] https://bugzilla.mozilla.org/show_bug.cgi?id=308590 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=686013
Received on Saturday, 10 September 2011 14:15:09 UTC