Re: removing the srcdoc from Tab Atkins Jr. on 2010-01-26 (public-html@w3.org from January 2010)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 26 Jan 2010 10:48:22 -0600
To: Shelley Powers <shelley.just@gmail.com>
Cc: HTMLWG WG <public-html@w3.org>
Message-ID: <dd0fbad1001260848s27ff9a4ejde25b2e84d271100@mail.gmail.com>
On Tue, Jan 26, 2010 at 8:40 AM, Shelley Powers <shelley.just@gmail.com> wrote:
> There is a great deal of contention about the new srcdoc attribute. Do we
> have to go through the Decision Process in order to have it removed?
> Having to do this to reverse an edit that hasn't shown itself to be overly
> popular is becoming a little tedious.

The only reasoning you have provided for removing @srcdoc is that you
think that the benefits of @sandbox aren't great enough to make it
worth using for blog comments and similar cases where you are
inserting content inline to your page.  Presumably the conclusion of
this is that we thus don't need to make it easier to meet this
use-case, and thus @srcdoc isn't needed.

I think a much more prudent course of action is to first see if we can
extend @sandbox to be more useful to this use-case.  There has been
some discussion of this in a thread on the WHATWG list, and I've
gathered a list of those suggestions and added a few of my own,
including some suggestions derived from your own objections.

I believe the correct solution to "This doesn't do enough." is "Can we
make it do more?", not "Welp, time to dump it.".


Now, we could still potentially drop @srcdoc entirely, if data urls
are sufficient.  @srcdoc is pretty much just solving some problems
with data urls in @src, after all.  From what I've been able to
gather, these are the problems with data urls (and possible
solutions):

1. Data urls are understood by browsers that don't know about
@sandbox, and thus will render there without sandboxing protections.
They "fail open", as opposed to @srcdoc which will only be implemented
by browsers that have already done @sandbox.  This appears to be
fixable by using a text/html-sandboxed mimetype on the url instead of
text/html, though.  Apparently, no current browser will execute a page
served with text/html-sandboxed, and we can hope that it continues
that way.

2. Data urls have more difficult escaping requirements.  However, all
the major web languages have an appropriate url-escaping function that
can be easily used.  There are some wrinkles - PHP has two of them,
and the one you'd think to try first (urlencode(), rather than
rawurlencode()) does the wrong thing (it turns spaces into +, rather
than %20), but this is both immediately obvious ("frist+post+LOLOLOL")
and not a security issue.  Even if you use an inappropriate escaping
function, such as PHP's htmlspecialchars() function (escapes <>"& by
default), it seems to work decently in current browsers.  (The failure
mode there is that, by default, it doesn't escape ', but this will
fail just as quickly on innocuous content as @srcdoc will when you
forget to escape ".  As well, Opera apparently breaks if you don't
escape # and tosses everything following it into the fragment, but
that merely results in a malformed page, not a security leak.)

3. Data urls have some annoying boilerplate - you'll have to start
every one of them with
"data:text/html-sandboxed;charset=utf-8,%3C%21DOCTYPE%20html%3E"
(maybe include a <title> too?).  We can't really fix this.  It's not a
*huge* deal, but it was a nice plus for @srcdoc.

4. Data urls are automatically unique-origin, and thus
allow-same-origin will have no effect by default.  Is it reasonable to
make allow-same-origin force data urls into the page's origin?
Perhaps only the @src of that iframe, while data urls used in the
content are still unique-origin?

5. URL size limits.  All the non-IE browsers appear to have fairly
large or unlimited url size limits (from a quick google search, at
least 80KB).  (I think FF is only limited by available memory?).  We
don't have to worry about current IEs, since they don't understand
data: at all as an iframe @src, and hopefully IE9 will have limits
similar to the other browsers.  However, it's certainly possible that
we could run into size limits in the future.  One use-case for this is
wiki pages, which can potentially be quite large indeed, especially if
data urls are used in the nested content.  An off-hand estimate from
Aryeh on the largest wikipedia pages suggests that they may be in the
1MB range.

6. Fallback.  @srcdoc allows you to use @src to point to a page for
users with browsers that don't support @srcdoc.  This isn't much use
for actually *serving* the content (if they don't support @srcdoc, you
probably can't rely on them to support @sandbox eithere), but it at
least allows you to say "Sorry, your browser doesn't support secure
iframes.  Upgrade, I guess."  Data urls using text/html-sandboxed have
a failure mode of causing an attempted download, or, in IE, of causing
the browser to navigate to the data url and then say that the page
can't be displayed.  Both fallbacks are bad, but the @srcdoc one is
more user-friendly.

I think that's all the major issues.  #6 is a big one, #5 is
potentially bad depending on the actual limits that browsers employ,
and #3 is unfortunate.  #1, #2, and #4 look to be either non-issues or
potentially solvable by the spec.

~TJ
Received on Tuesday, 26 January 2010 16:49:15 UTC