Re: <iframe doc=""> from Shelley Powers on 2010-01-24 (public-html@w3.org from January 2010)

From: Shelley Powers <shelley.just@gmail.com>
Date: Sun, 24 Jan 2010 13:31:57 -0600
To: "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <643cc0271001241131m38356187w5ca6d2c025640a83@mail.gmail.com>
On Sun, Jan 24, 2010 at 12:38 PM, Tab Atkins Jr. <jackalmage@gmail.com> wrote:
> On Sun, Jan 24, 2010 at 12:04 PM, Shelley Powers <shelley.just@gmail.com> wrote:
>> On Sun, Jan 24, 2010 at 11:14 AM, Tab Atkins Jr. <jackalmage@gmail.com> wrote:
>>> On Sun, Jan 24, 2010 at 10:55 AM, Shelley Powers <shelley.just@gmail.com> wrote:
>>>> This is an old issue. We have had software to sanitize comments for a
>>>> long time. It's built into most CMS tools. And for those who disregard
>>>> the use of such tools, they're not going to use this, either.
>>>
>>> Indeed, there are nearly as many html-sanitizers as there are CMSes.
>>> And they're pretty uniformly bad.  Most of them are built on fragile
>>> regexps, if you're lucky.  They might just be a handful of string
>>> replaces that address whatever problems the CMS author could think of
>>> at the time.  The best of them address *currently known attack
>>> vectors* decently enough, but are usually weak to *new* attacks.
>>>
>>
>> Most are not bad, many are good, a few are exceptional. I don't
>> believe either Drupal or Wordpress are vulnerable to script attacks in
>> comments. Do you have a demonstration how script attacks would
>> circumvent the protections in place in these CMS? When they're using,
>> oh, something like htmLawed?
>
> htmLawed is large and regexp-based.  HTML is not a regular language,
> and is too complex to be addressed by regexps in the wild (which
> recognize more than just regular languages).
>
> There is, indeed, certainly a flaw or lack in its sanitation
> somewhere.  There will always be unless you use a parser/tokenizer
> that works the same way as the one that browsers use*.  @srcdoc just
> offloads the job to the browser itself, so there's no chance of a
> mismatch.
>
> *If your algorithm doesn't work the same way as the browsers', then
> you can pass something through as harmless that the browser interprets
> differently and permits an attack.  For a trivial example of this in a
> similar domain, look at the history of PHP's mysql_escape_string() and
> mysql_real_escape_string() functions.  The former didn't escape
> certain UTF-7 sequences that mySQL translated to 'normal' characters,
> allowing SQL injection to still occur.  An escaping function must know
> *exactly* how the target is going to process the code, or else it
> risks precisely this sort of error.
>
>> Are you saying that this is the rationale for this change?
>
> The rationale for @srcdoc is to be able to use the benefits of
> @sandbox without incurring network requests (by linking to content as
> a normal iframe).  The benefits of @sandbox have been discussed
> previously.  The specific proposal for @srcdoc has it's own benefits
> over similar proposals that address the same issue; namely, it has
> only two trivial escaping requirements, only one of which is relevant
> for security.  As well, the security-relevant escape requirement
> should fail very quickly and visibly if it's left out (as any
> innocuous " in the content will cause the rest of the content to
> drop), as opposed to many of the other proposals which fail much more
> quietly, perhaps only when an actual attack takes place.
>
>> If so, do you have specific examples of these commonly occurring
>> vulnerabilities in existing santizer technologies? You have specific
>> ways to circumvent the sanitizers?
>
> Look into the history of nearly every XSS attack that has ever been
> created.  Look at the changelogs of any specific widely-used
> sanitizers, if they use a publicly-available source control.
>
>>> On the other hand, @srcdoc makes this whole thing trivial, and allows
>>> us to leverage the behavioral restraints of @sandbox as well.  It's a
>>> win for everyone.  The only loss is if you were somehow silly enough
>>> to write code with @srcdoc by hand, and I've already explained why
>>> that's a silly thing to do.
>>>
>>
>> But people have to write the templates by hand. At some point in time,
>> humans are involved in web pages. Whether they write the code to
>> generate the content, design the templates to use the code, or yes,
>> even create the web page by hand--humans are involved.
>
> Indeed, the templates are written by hand.  I just did so a message or
> two back to demonstrate precisely what is expected to be written.  I
> also demonstrated that a template using @srcdoc is nearly identical to
> a template not using it, except that the former gets to benefit from
> the @sandbox security model.
>
> If the entire page is written by hand, *then there's no need for
> @srcdoc*.  You are writing the code yourself, so you don't have to
> protect yourself from XSS attacks written by yourself to steal
> information from yourself.  You just write the code.
>
> So, I'm not sure what exactly you're objecting to.
>
>> Ultimately, this stuff has to be meaningful for humans in order to
>> work. This change, is not meaningful.
>
> The templates, and how to produce them, is completely meaningful.
> What part of the code I demonstrated is confusing?  The generated code
> may be ugly, due to the extra escapes and long content in attributes,
> but the generated code is only relevant for browsers.  Humans don't
> look at that, they look at the generating code.
>
> What, specifically, do you find not meaningful?
>
> ~TJ
>

Thanks for the rationale.

Shelley
Received on Sunday, 24 January 2010 19:32:32 UTC