- From: Aryeh Gregor <Simetrical+w3c@gmail.com>
- Date: Sun, 24 Jan 2010 15:42:22 -0500
- To: Shelley Powers <shelley.just@gmail.com>
- Cc: "Tab Atkins Jr." <jackalmage@gmail.com>, Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
On Sun, Jan 24, 2010 at 11:55 AM, Shelley Powers <shelley.just@gmail.com> wrote: > The same tools also provide the code to sanitize comments before > they're posted. This is not the case. Every programming language includes a facility by which you can trivially convert " to " and & to &. None I know of provides a facility to sanitize HTML, for any definition of sanitize. MediaWiki has more than a thousand lines of code to do this. Other software is probably less, since most software has much stricter whitelists (e.g., limited or no CSS), but it's still not easy at all to sanitize HTML. It's almost impossible right now to just sit down and write some new blog software that allows comments, without either 1) restricting them to plaintext, or 2) risking security vulnerabilities. HTML is a very complicated format with lots of evil gotchas. For instance, you might think you can let comments through -- but then you just opened up script execution in IE due to conditional comments. <iframe sandbox> is a great way to fix that problem. Together with the seamless attribute and contenteditable, you could allow rich-text blog comments with very little effort, and with no security risk. But having to make a separate document for each blog comment to link to with <iframe src=""> is a pain in the neck, and also runs into risks if someone gets a direct link to one of those documents. So some way to specify the sandboxed content inline is what's needed to make this feature perfect. Unfortunately, all possible ways to do that are horribly ugly. You can't use sane syntax like <iframe sandbox>literal sandboxed text</iframe> because the author could just put in an <iframe> himself. And you can't ask authors to escape </iframe>, because they'll forget; they're much less likely to forget to escape ", because if they don't, content will break very quickly. So you're left with srcdoc="". Which is pretty nasty, but I can't see a better answer.
Received on Sunday, 24 January 2010 20:42:50 UTC