W3C home > Mailing lists > Public > public-webapps@w3.org > October to December 2011

Re: Sanatising HTML content through sandboxing

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 8 Nov 2011 23:28:59 -0800
Message-ID: <CA+c2ei-XaQtRHdgpNHUHd=45mZyCFbahz0V30PcWtx9PmG-j8A@mail.gmail.com>
To: Ryan Seddon <seddon.ryan@gmail.com>
Cc: public-webapps <public-webapps@w3.org>
Given that this type of sandbox would work very differently from the
iframe sandbox, I think reusing the same attribute name would be
confusing.

Additionally, what's the behavior if you remove the attribute? What if
you do elem.innerHTML += "foo" on the element after having removed the
sandbox? Or on an elements parent?

Or what happens if you do foo.innerHTML = bar.innerHTML where a parent
of bar has sandbox set?

When sanitizing, I strongly feel that we should simply remove all
content that could execute script as to ensure that it doesn't leak
somewhere else when markup is copied. Trying to ensure that it never
executes, while still allowing it to exist, is too high risk IMO.

/ Jonas

On Tue, Nov 8, 2011 at 5:21 PM, Ryan Seddon <seddon.ryan@gmail.com> wrote:
> Right now there is no simple way to sanitise HTML content by stripping it of
> any potentially malicious HTML such as scripts etc.
>
> In the "innerHTML in DocumentFragment" thread I suggested following the
> sandbox attribute approach that can be applied to iframes. I've moved this
> out into its own thread, as Jonas suggested, so as not to dilute the
> innerHTML discussion.
>
> There was mention of a suggested API called innerStaticHTML as a potential
> solution to this, I personally would prefer to reuse the sandbox approach
> that the iframes use.
>
> e.g.
>
> xhr.responseText = "<script
> src='malicious.js'></script><div><h1>contentM/h1></div>";
>
> var div = document.createElement("div");
>
> div.sandbox = ""; // Static content only
> div.innerHTML = xhr.responseText;
>
> document.body.appendChild(div);
>
> This could also apply to a documentFragment and any other applicable DOM
> API's, being able to let the HTML parser do what it does best would make
> sense.
>
> The advantage of this over a new API is that it would also allow the use of
> the space separated tokens to white list certain things within the HTML
> being parsed into the document and open it to future extension.
>
> -Ryan
>
Received on Wednesday, 9 November 2011 07:30:14 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:48 GMT