W3C home > Mailing lists > Public > public-html@w3.org > January 2010

Re: <iframe doc="">

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Sun, 24 Jan 2010 12:38:55 -0600
Message-ID: <dd0fbad1001241038m607bf849xbab2aa56d421d274@mail.gmail.com>
To: Shelley Powers <shelley.just@gmail.com>
Cc: Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
On Sun, Jan 24, 2010 at 12:04 PM, Shelley Powers <shelley.just@gmail.com> wrote:
> On Sun, Jan 24, 2010 at 11:14 AM, Tab Atkins Jr. <jackalmage@gmail.com> wrote:
>> On Sun, Jan 24, 2010 at 10:55 AM, Shelley Powers <shelley.just@gmail.com> wrote:
>>> This is an old issue. We have had software to sanitize comments for a
>>> long time. It's built into most CMS tools. And for those who disregard
>>> the use of such tools, they're not going to use this, either.
>> Indeed, there are nearly as many html-sanitizers as there are CMSes.
>> And they're pretty uniformly bad.  Most of them are built on fragile
>> regexps, if you're lucky.  They might just be a handful of string
>> replaces that address whatever problems the CMS author could think of
>> at the time.  The best of them address *currently known attack
>> vectors* decently enough, but are usually weak to *new* attacks.
> Most are not bad, many are good, a few are exceptional. I don't
> believe either Drupal or Wordpress are vulnerable to script attacks in
> comments. Do you have a demonstration how script attacks would
> circumvent the protections in place in these CMS? When they're using,
> oh, something like htmLawed?

htmLawed is large and regexp-based.  HTML is not a regular language,
and is too complex to be addressed by regexps in the wild (which
recognize more than just regular languages).

There is, indeed, certainly a flaw or lack in its sanitation
somewhere.  There will always be unless you use a parser/tokenizer
that works the same way as the one that browsers use*.  @srcdoc just
offloads the job to the browser itself, so there's no chance of a

*If your algorithm doesn't work the same way as the browsers', then
you can pass something through as harmless that the browser interprets
differently and permits an attack.  For a trivial example of this in a
similar domain, look at the history of PHP's mysql_escape_string() and
mysql_real_escape_string() functions.  The former didn't escape
certain UTF-7 sequences that mySQL translated to 'normal' characters,
allowing SQL injection to still occur.  An escaping function must know
*exactly* how the target is going to process the code, or else it
risks precisely this sort of error.

> Are you saying that this is the rationale for this change?

The rationale for @srcdoc is to be able to use the benefits of
@sandbox without incurring network requests (by linking to content as
a normal iframe).  The benefits of @sandbox have been discussed
previously.  The specific proposal for @srcdoc has it's own benefits
over similar proposals that address the same issue; namely, it has
only two trivial escaping requirements, only one of which is relevant
for security.  As well, the security-relevant escape requirement
should fail very quickly and visibly if it's left out (as any
innocuous " in the content will cause the rest of the content to
drop), as opposed to many of the other proposals which fail much more
quietly, perhaps only when an actual attack takes place.

> If so, do you have specific examples of these commonly occurring
> vulnerabilities in existing santizer technologies? You have specific
> ways to circumvent the sanitizers?

Look into the history of nearly every XSS attack that has ever been
created.  Look at the changelogs of any specific widely-used
sanitizers, if they use a publicly-available source control.

>> On the other hand, @srcdoc makes this whole thing trivial, and allows
>> us to leverage the behavioral restraints of @sandbox as well.  It's a
>> win for everyone.  The only loss is if you were somehow silly enough
>> to write code with @srcdoc by hand, and I've already explained why
>> that's a silly thing to do.
> But people have to write the templates by hand. At some point in time,
> humans are involved in web pages. Whether they write the code to
> generate the content, design the templates to use the code, or yes,
> even create the web page by hand--humans are involved.

Indeed, the templates are written by hand.  I just did so a message or
two back to demonstrate precisely what is expected to be written.  I
also demonstrated that a template using @srcdoc is nearly identical to
a template not using it, except that the former gets to benefit from
the @sandbox security model.

If the entire page is written by hand, *then there's no need for
@srcdoc*.  You are writing the code yourself, so you don't have to
protect yourself from XSS attacks written by yourself to steal
information from yourself.  You just write the code.

So, I'm not sure what exactly you're objecting to.

> Ultimately, this stuff has to be meaningful for humans in order to
> work. This change, is not meaningful.

The templates, and how to produce them, is completely meaningful.
What part of the code I demonstrated is confusing?  The generated code
may be ugly, due to the extra escapes and long content in attributes,
but the generated code is only relevant for browsers.  Humans don't
look at that, they look at the generating code.

What, specifically, do you find not meaningful?

Received on Sunday, 24 January 2010 18:39:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:00 GMT