- From: Alexey Feldgendler <alexey@feldgendler.ru>
- Date: Thu, 09 Mar 2006 22:57:31 +0600
On Mon, 06 Mar 2006 16:48:08 +0600, Gervase Markham <gerv at mozilla.org> wrote: >> I never said that the website won't have to do HTML cleaning for >> user-supplied content. But with HTML 5 reference parsing algorithm, such >> cleaning is going to be much easier and straightforward: parse the text >> into DOM (as if it was inside BODY, for example), remove or modify >> forbidden elements, then serialize it. That way, </SANDBOX> will be >> ignored as an easy parse error because it doesn't match an opening tag >> within the user-supplied text. An unclosed comment will be ignored, too. > Er, what defines "the user-supplied content"? Surely it's the <SANDBOX> > tags? So how can you say "A </SANDBOX> inside the user-supplied content > will be ignored", as you don't know whether a </SANDBOX> you encounter > is the end of the sandbox or not? > > Or are you suggesting that only one sandbox per page is allowed, and the > user agent should use the outermost </SANDBOX> tag? It's my fault, I just didn't make it clear enough. Here is the scenario I was keeping in mind. Let's imagine a blogging website that allows anybody to create a blog which is available as http://www.example.com/blogs/username/. Many such sites allow various user customization, so imagine this site lets the blog owner to supply custom HTML to display on top of the blog page. This is primarily used by blog authors to design stylish navigation. To make such navigation menus more attractive, the authors wish to use JavaScript and Flash, but unrestricted JavaScript would make it possible for the blog owner to steal visitors' session cookies. The blog author logs in and opens some kind of customization screen: HTML to display on top of your blog: [TEXTAREA] [SUBMIT] So, imagine the blog author enters into the textarea: Welcome to my blog!</sandbox><a href="#" onclick="alert(document.cookie)">Click here</a> After submission, this code is fed to the HTML cleaner. At present, HTML cleaners are usually complicated scripts which try to catch known quirks of the user agents, and still they usually have security holes found one after another. See for example http://cvs.livejournal.org/browse.cgi/livejournal/cgi-bin/cleanhtml.pl. With HTML 5 parsing spec, there will be one single algorithm for parsing HTML code with well-defined error recovery. So, the HTML cleaner at the server side runs the HTML 5 parser on the user-supplied text, which produces the following DOM: * Welcome to my blog! * A href="#" onclick="alert(document.cookie)" * Click here The </sandbox> tag is ignored as an easy parse error because there is no matching <sandbox> tag in the user-supplied text. After parsing, the HTML cleaner iterates through the tree, renaming potentially unsafe elements and attributes, producing the following: * Welcome to my blog! * A href="#" safe-onclick="alert(document.cookie)" * Click here At the final stage, the HTML cleaner re-serializes the DOM into the following code, which is saved into the database: Welcome to my blog!<a href="#" safe-onclick="alert(document.cookie)">Click here</a> When the site renders the blog page, it puts the "HTML for page top" inside a sandbox: <body> <sandbox> Welcome to my blog!<a href="#" safe-onclick="alert(document.cookie)">Click here</a> </sandbox> ... </body> Each blog entry is probably also contained in its own sandbox. This is even more important on the so-called friends pages, where entries by different authors are displayed on the same page. When the page is rendered in a modern user agent which supports sandboxing, the safe-onclick attribute is interpreted exactly the same as onclick. When the user clicks the link, the event handler is executed. Because the code is inside the sandbox, it operates on a fake document object, so it doesn't retrieve the cookies (I think document.cookie should just return an empty string). The visitor's session cookies are safe. When the page is rendered in an older user agent which doesn't support sandboxing, the safe-onclick attribute is ignored because it is unknown. When the user clicks the link, no event handler is executed, and the cookies are safe again. -- Opera M2 8.5 on Debian Linux 2.6.12-1-k7 * Origin: X-Man's Station [ICQ: 115226275] <alexey at feldgendler.ru>
Received on Thursday, 9 March 2006 08:57:31 UTC