Re: A perfect DOM sandbox from sird@rckc.at on 2011-02-15 (public-web-security@w3.org from February 2011)

From: <sird@rckc.at>
Date: Tue, 15 Feb 2011 08:45:06 +0100
To: gaz Heyes <gazheyes@gmail.com>
Cc: public-web-security@w3.org
Message-ID: <AANLkTinKfiEYjE5o1PnjpUCBTxPNKs6WT7s6hxte1qaL@mail.gmail.com>
> escaped HTML may become malicious HTML when read by innerHTML and so on.
innerHTML and cssText is untrusted. It's actually defined as untrusted by
the HTML5 standard, so modifying innerHTML or copy pasting it is unsafe.

If you want to do this, you have to manually go through the generated DOM
and emulate a SAX parser if you wish.. that should give you the granularity
you need.

Greetings!!

-- Eduardo



On Tue, Feb 15, 2011 at 8:40 AM, sird@rckc.at <sird@rckc.at> wrote:

> Hey!
>
> Try this function, does it meet your needs? (try it on
> http://0x.lv/shell.html). Works on FF 4, IE 6/7/8, Safari, Opera and
> Chrome.. though, I haven't really tested how safe it is :) it just seems to
> work.
>
> Worth noting that it returns a HTMLNodeElement belonging to a deleted
> document, with no Window associated with it.. which means that it's
> ownerDocument may be null in some browsers, and you can't appendNode
> (because you need to importNode first).
>
> Either way, it shouldn't execute stuff if you play with it.. oh also, you
> probably want to modify the iframe's <base href>.
>
> function parseHTML(src){
>     var ifr = document.createElement("iframe");
>     ifr.setAttribute("sandbox","allow-same-origin");
>     ifr.setAttribute("security","restricted");
>     if(navigator.userAgent.match(/Firefox/))
>         ifr.setAttribute("src","/xss.php?csp&plain_text");
>     document.body.appendChild(ifr);
>     try {
>         ifr.contentDocument.documentElement.innerHTML=src;
>     } catch(e) {
>         ifr.contentDocument.write(src);
>         ifr.contentDocument.close();
>     }
>     var dom = ifr.contentDocument.documentElement.cloneNode(true);
>     document.body.removeChild(ifr);
>     return dom;
> }
> parseHTML("<img src=/ onload=alert(1)
> onerror=alert(1)><script>alert(1)</script><iframe
> src=javascript:alert(1)></iframe><b>hello</b>").getElementsByTagName("b")[0].innerHTML;
> parseHTML("<xD/>").getElementsByTagName("*")[0].innerHTML="<img src=/
> onload=alert(1) onerror=alert(1)><script>alert(1)</script><iframe
> src=javascript:alert(1)></iframe>";
>
> Greetings!!
>
> -- Eduardo
>
>
>
>
> On Mon, Feb 14, 2011 at 1:13 PM, gaz Heyes <gazheyes@gmail.com> wrote:
>
>> Hey all,
>>
>> Here are my thoughts on native browser sandboxing and it's benefits
>>
>> In order to create a perfect DOM sandbox you need the ability to read/set
>> what the browser does. Server side filtering is always doomed to failure
>> because the various browsers all have different rendering engines. The
>> server cannot know all these quirks or custom functionality that the browser
>> adds. It might change from month to month.
>>
>> If each browser had the ability to parse content without rendering we
>> could use this to create a perfect sandbox. To filter incoming data the
>> browser would have a special function to parse the data but not render like
>> the following:-
>>
>> parseHTML('<b
>> style=color:#fff;-some-crazy-vendor-functionality:pwnd>test</b>');
>>
>> The parseHTML function would return a object/serialized data that the
>> sandbox could use to read the incoming HTML. We currently have the DOM to do
>> this dynamically but it is broken in many places on every browser, the main
>> problem is that the content is not a true representation of the rendering
>> code. Therefore one CSS rule may become two css rules, escaped HTML may
>> become malicious HTML when read by innerHTML and so on.
>>
>> Once we have gathered our whitelist of data we wish to allow, the browser
>> also needs the ability to set the data, this could be done by using
>> renderHTML another native browser function.
>>
>> renderHTML([{tag:'b',style:{color:'#ccc;no new rule'}}]);
>>
>> Here we are defining a whitelist to render, the renderer should not render
>> anything other than a bold tag and one css color rule. The colour should
>> either be assigned #ccc with the invalid rule dropped or the rule should be
>> dropped entirely because it contains multiple rules.
>>
>> The same method could be applied to CSS using parseCSS()/renderCSS() and
>> JavaScript by using parseJavaScript()/executeJavaScript().
>>
>> I have created a imperfect DOM sandbox using these methods, it's imperfect
>> because I use the browser DOM to render/read HTML. Originally I tried to use
>> the DOM to set the HTML but I found many problems in different browsers with
>> inconsistances in HTML and CSS. So I had to create output that the DOM would
>> render correctly without modifying into malicious content.
>>
>> <http://www.businessinfo.co.uk/labs/HTMLReg/HTMLReg.html>
>>
>> You might have noticed two things with the demo, I modify classes/ids of
>> DOM objects. This brings me to my next point, you cannot trust DOM objects
>> to be rendered with classes or id attributes as they conflict with other DOM
>> objects or native JavaScript objects. We'd also need the ability to define a
>> prefix/suffix of each class or id used. The developer could invent their own
>> method but they are likely to invent a bad method or fail to account for
>> pitfalls such as underscores not being valid CSS classes in some browsers
>> etc. it would be nice to be able to set a global prefix/suffix that the
>> rendering functions use to automatically do this.
>>
>> The second point is my demo proxies image requests, what I mean by that is
>> that any images specified go through the gmodules proxy. This prevents
>> sandboxed content from performing CSRF on other sites when a "harmless" img
>> tag is rendered. We also need the ability to do this natively in the
>> browser. I thought maybe a image renderer protocol handler or other cool
>> suggestions at the OWASP summit was to use a "cookie" attribute to disable
>> cookies from the http request of the image. This would allow a sandbox to
>> render images and other tags without cookies being sent:-
>>
>> <img src="//somesite/account?funds=1000&transfer=true" cookies="false">
>>
>> Integrating a JavaScript sandbox into user supplied HTML could also be
>> done in the browser take the following examples:-
>> parsedHTML=parseHTML('<a href="#"
>> onclick="top.location=\'//evilserver\';alert(1);">test</a>');
>> //1st arg js, 2nd arg object/function whitelist
>> parsedOnclick=parseJavaScript(parsedHTML.onclick, {alert:true});//only
>> allow alert
>> renderHTML([{tag:'a',href:'#',onclick:parsedOnclick]);
>>
>>
>> We need these native sandboxing/parsing functions to provide a way to
>> guarantee output. As soon as HTML is stored somewhere and rendered directly
>> the filtering performed by the server has already expired. The only true way
>> to filter content is to provide filtering at the client.
>>
>> Cheers
>>
>> Gareth
>>
>
>
Received on Tuesday, 15 February 2011 07:50:29 UTC