Re: Sanatising HTML content through sandboxing from Charles Pritchard on 2011-11-10 (public-webapps@w3.org from October to December 2011)

From: Charles Pritchard <chuck@jumis.com>
Date: Thu, 10 Nov 2011 01:12:03 -0800
To: Adam Barth <w3c@adambarth.com>
CC: Jonas Sicking <jonas@sicking.cc>, Ryan Seddon <seddon.ryan@gmail.com>, public-webapps <public-webapps@w3.org>
Message-ID: <4EBB9563.2000901@jumis.com>

+1 to an HTMLParser object.
Many other methods end up loading resources when image elements are created.


On 11/8/11 11:54 PM, Adam Barth wrote:
> Also, a div doesn't represent a security boundary.  It's difficult to
> sandbox something unless you have a security boundary around it.
> IMHO, an easy way to solve this problem is to just exposes an
> HTMLParser object, analogous to DOMParser, which folks can use to
> safely parse HTML, e.g., from XMLHttpRequest.
>
> Adam
>
>
> On Tue, Nov 8, 2011 at 11:28 PM, Jonas Sicking<jonas@sicking.cc>  wrote:
>> Given that this type of sandbox would work very differently from the
>> iframe sandbox, I think reusing the same attribute name would be
>> confusing.
>>
>> Additionally, what's the behavior if you remove the attribute? What if
>> you do elem.innerHTML += "foo" on the element after having removed the
>> sandbox? Or on an elements parent?
>>
>> Or what happens if you do foo.innerHTML = bar.innerHTML where a parent
>> of bar has sandbox set?
>>
>> When sanitizing, I strongly feel that we should simply remove all
>> content that could execute script as to ensure that it doesn't leak
>> somewhere else when markup is copied. Trying to ensure that it never
>> executes, while still allowing it to exist, is too high risk IMO.
>>
>> / Jonas
>>
>> On Tue, Nov 8, 2011 at 5:21 PM, Ryan Seddon<seddon.ryan@gmail.com>  wrote:
>>> Right now there is no simple way to sanitise HTML content by stripping it of
>>> any potentially malicious HTML such as scripts etc.
>>>
>>> In the "innerHTML in DocumentFragment" thread I suggested following the
>>> sandbox attribute approach that can be applied to iframes. I've moved this
>>> out into its own thread, as Jonas suggested, so as not to dilute the
>>> innerHTML discussion.
>>>
>>> There was mention of a suggested API called innerStaticHTML as a potential
>>> solution to this, I personally would prefer to reuse the sandbox approach
>>> that the iframes use.
>>>
>>> e.g.
>>>
>>> xhr.responseText = "<script
>>> src='malicious.js'></script><div><h1>contentM/h1></div>";
>>>
>>> var div = document.createElement("div");
>>>
>>> div.sandbox = ""; // Static content only
>>> div.innerHTML = xhr.responseText;
>>>
>>> document.body.appendChild(div);
>>>
>>> This could also apply to a documentFragment and any other applicable DOM
>>> API's, being able to let the HTML parser do what it does best would make
>>> sense.
>>>
>>> The advantage of this over a new API is that it would also allow the use of
>>> the space separated tokens to white list certain things within the HTML
>>> being parsed into the document and open it to future extension.
>>>
>>> -Ryan
>>>
>>

Received on Thursday, 10 November 2011 09:12:32 UTC