Re: <iframe doc=""> from Maciej Stachowiak on 2010-01-24 (public-html@w3.org from January 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 24 Jan 2010 10:51:10 -0800
To: "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: Shelley Powers <shelley.just@gmail.com>, Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
Message-id: <5E31C488-1B86-45DB-B28C-1BA856BE9817@apple.com>

On Jan 24, 2010, at 10:38 AM, Tab Atkins Jr. wrote:

> On Sun, Jan 24, 2010 at 12:04 PM, Shelley Powers <shelley.just@gmail.com> wrote:
>> On Sun, Jan 24, 2010 at 11:14 AM, Tab Atkins Jr. <jackalmage@gmail.com> wrote:
>>> On Sun, Jan 24, 2010 at 10:55 AM, Shelley Powers <shelley.just@gmail.com> wrote:
>>>> This is an old issue. We have had software to sanitize comments for a
>>>> long time. It's built into most CMS tools. And for those who disregard
>>>> the use of such tools, they're not going to use this, either.
>>> 
>>> Indeed, there are nearly as many html-sanitizers as there are CMSes.
>>> And they're pretty uniformly bad.  Most of them are built on fragile
>>> regexps, if you're lucky.  They might just be a handful of string
>>> replaces that address whatever problems the CMS author could think of
>>> at the time.  The best of them address *currently known attack
>>> vectors* decently enough, but are usually weak to *new* attacks.
>>> 
>> 
>> Most are not bad, many are good, a few are exceptional. I don't
>> believe either Drupal or Wordpress are vulnerable to script attacks in
>> comments. Do you have a demonstration how script attacks would
>> circumvent the protections in place in these CMS? When they're using,
>> oh, something like htmLawed?
> 
> htmLawed is large and regexp-based.  HTML is not a regular language,
> and is too complex to be addressed by regexps in the wild (which
> recognize more than just regular languages).
> 
> There is, indeed, certainly a flaw or lack in its sanitation
> somewhere.  There will always be unless you use a parser/tokenizer
> that works the same way as the one that browsers use*.  @srcdoc just
> offloads the job to the browser itself, so there's no chance of a
> mismatch.

It's not really just a matter of parsing, but also of knowing what kinds of things in the markup can cause script to run (or do other things that it's desirable to block when sanitizing). Though that's really a feature of sandboxed iframes, not of srcdoc per se. The browser can know definitively whenever it's about to run script, so it can definititively stop all possible ways of doing so without having to guess.

Regards,
Maciej

Received on Sunday, 24 January 2010 18:51:44 UTC