- From: Mikko Rantalainen <mikko.rantalainen@peda.net>
- Date: Wed, 18 Jun 2008 10:19:51 +0300
Frode B?rli wrote: >>> I have been reading up on past discussions on sandboxing content, and >>> >>> My main arguments for having this feature (in one form or another) in >>> the browser is: >>> >>> - It is future proof. Changes to browsers (for example adding >>> expression support to css) will never again require old sanitizers to >>> be updated. Unless some braindead vendor is going to add scripting-in-sandboxing feature which would be equally braindead to unlimited expression support in css. You cannot be future proof unless you trust all the players including ALL possible browser vendors. >> If the sanitiser uses a whitelist based approach that forbids everything by >> default, and then only allows known elements and attributes; and in the case >> of the style attribute, known properties and values that are safe, then that >> would also be the case. > > I have written a sanitizer for html and it is very difficult - > especially since browsers have undocumented bugs in their parsing. > > Example: <div colspan=& > style=font-family=expression(alert("hacked")) > colspan=&>Red</div> Every real sanitizer MUST parse the input and generate its internal DOM. If you then generate known good serialization of that DOM there's no way your sanitizer would ever output such code. I, too, have written my own simplified HTML parser that converts all unknown parts to data (that is, escape all the following characters: "<>&'). Just parse the input into DOM and only after that check if for safe content. You cannot sanitize HTML using only string replacements without generating a DOM (all of DOM is not needed in the memory at once, it's possible to process the input as a stream and handle one tag at a time and only keep a stack of open tag names in addition). > The proof that sanitazing HTML is difficult is the fact that no major > site even attempts it. Even wikipedia use some obscure wiki-language, > instead of implementing a wysiwyg editor. Wikipedia does sanitize HTML in the content. It does support its own wiki-language in addition to HTML. For example, Try to input the following text as is in the wikipedia sandbox page and press "Show preview": *** > > Example: <div colspan=& > style=font-family=expression(alert("hacked")) > colspan=&>Red</div> Some <b>more</b> content <i>here</i>. *** Works just fine. The content is sanitized and unregognized parts are converted to data. Correctly written parts are used as HTML tags. Trust me, it's really not that hard. The hard part is to decide which tags and which attributes and which attribute values do you want to allow. And you have to decide that by yourself - there's no magic silver bullet safe feature set that is suitable for every usage and for every site. If you don't want to go through all this trouble, do not try to allow HTML or any other markup in user generated content unless you *really* trust your users. >> Note that sandboxing doesn't entirely remove the need for sanitising user >> generated content on the server, it's just an extra line of defence in case >> something slips through. > > Ofcourse. However, the sandbox feature in browser will be fail safe if > user generated content is escaped with < and > before being sent > to the browser - as long as the browser does not have bugs of course. That's a pretty big "if". If the page author / server application programmer is always able to escape content correctly, how much harder is it to correctly escape and sanitize the content in anyway? All this sounds too much like magic_quotes in PHP... >>> A problem with this approach is that developers might forget to escape >>> tags, therefore I think browsers should display a security warning >>> message if the character < or > is encountered inside a <data> tag. >> If a developer forgot to escape the markup at all, then a user could enter >> "</data><script>...</script>" and do anything they wanted. > > Yes, that is my point. That is why I want the sandbox to display a > severe security warning if the developer has forgotten to escape it. Isn't that a bit too late? If the developer is not testing his application before the release what's the point of breaking the whole site in the user's browser as a result? It will not guard against XSS because the user generated content can be *first* used to end the sandbox and *then* user to insert XSS attack. Browser sees only valid content in the sandbox and site is still under XSS attack. > This method will be safe for all browsers that has ever existed and > that will ever exist in the future. If new features are introduced in > some future version of CSS or HTML - the sandbox is still there and > the applications created today does not need to have their sanitizers > updated, ever. That's a pretty bold claim! I guess that a similar claim could have been said about CSS support before Microsoft added the "expression()" value syntax. Can *you* guarantee that a random browser vendor does not implement anything stupid for the sandbox content in the future? -- Mikko -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 254 bytes Desc: OpenPGP digital signature URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20080618/770cdc44/attachment.pgp>
Received on Wednesday, 18 June 2008 00:19:51 UTC