- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Sun, 24 Jan 2010 12:38:55 -0600
- To: Shelley Powers <shelley.just@gmail.com>
- Cc: Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
On Sun, Jan 24, 2010 at 12:04 PM, Shelley Powers <shelley.just@gmail.com> wrote: > On Sun, Jan 24, 2010 at 11:14 AM, Tab Atkins Jr. <jackalmage@gmail.com> wrote: >> On Sun, Jan 24, 2010 at 10:55 AM, Shelley Powers <shelley.just@gmail.com> wrote: >>> This is an old issue. We have had software to sanitize comments for a >>> long time. It's built into most CMS tools. And for those who disregard >>> the use of such tools, they're not going to use this, either. >> >> Indeed, there are nearly as many html-sanitizers as there are CMSes. >> And they're pretty uniformly bad. Most of them are built on fragile >> regexps, if you're lucky. They might just be a handful of string >> replaces that address whatever problems the CMS author could think of >> at the time. The best of them address *currently known attack >> vectors* decently enough, but are usually weak to *new* attacks. >> > > Most are not bad, many are good, a few are exceptional. I don't > believe either Drupal or Wordpress are vulnerable to script attacks in > comments. Do you have a demonstration how script attacks would > circumvent the protections in place in these CMS? When they're using, > oh, something like htmLawed? htmLawed is large and regexp-based. HTML is not a regular language, and is too complex to be addressed by regexps in the wild (which recognize more than just regular languages). There is, indeed, certainly a flaw or lack in its sanitation somewhere. There will always be unless you use a parser/tokenizer that works the same way as the one that browsers use*. @srcdoc just offloads the job to the browser itself, so there's no chance of a mismatch. *If your algorithm doesn't work the same way as the browsers', then you can pass something through as harmless that the browser interprets differently and permits an attack. For a trivial example of this in a similar domain, look at the history of PHP's mysql_escape_string() and mysql_real_escape_string() functions. The former didn't escape certain UTF-7 sequences that mySQL translated to 'normal' characters, allowing SQL injection to still occur. An escaping function must know *exactly* how the target is going to process the code, or else it risks precisely this sort of error. > Are you saying that this is the rationale for this change? The rationale for @srcdoc is to be able to use the benefits of @sandbox without incurring network requests (by linking to content as a normal iframe). The benefits of @sandbox have been discussed previously. The specific proposal for @srcdoc has it's own benefits over similar proposals that address the same issue; namely, it has only two trivial escaping requirements, only one of which is relevant for security. As well, the security-relevant escape requirement should fail very quickly and visibly if it's left out (as any innocuous " in the content will cause the rest of the content to drop), as opposed to many of the other proposals which fail much more quietly, perhaps only when an actual attack takes place. > If so, do you have specific examples of these commonly occurring > vulnerabilities in existing santizer technologies? You have specific > ways to circumvent the sanitizers? Look into the history of nearly every XSS attack that has ever been created. Look at the changelogs of any specific widely-used sanitizers, if they use a publicly-available source control. >> On the other hand, @srcdoc makes this whole thing trivial, and allows >> us to leverage the behavioral restraints of @sandbox as well. It's a >> win for everyone. The only loss is if you were somehow silly enough >> to write code with @srcdoc by hand, and I've already explained why >> that's a silly thing to do. >> > > But people have to write the templates by hand. At some point in time, > humans are involved in web pages. Whether they write the code to > generate the content, design the templates to use the code, or yes, > even create the web page by hand--humans are involved. Indeed, the templates are written by hand. I just did so a message or two back to demonstrate precisely what is expected to be written. I also demonstrated that a template using @srcdoc is nearly identical to a template not using it, except that the former gets to benefit from the @sandbox security model. If the entire page is written by hand, *then there's no need for @srcdoc*. You are writing the code yourself, so you don't have to protect yourself from XSS attacks written by yourself to steal information from yourself. You just write the code. So, I'm not sure what exactly you're objecting to. > Ultimately, this stuff has to be meaningful for humans in order to > work. This change, is not meaningful. The templates, and how to produce them, is completely meaningful. What part of the code I demonstrated is confusing? The generated code may be ugly, due to the extra escapes and long content in attributes, but the generated code is only relevant for browsers. Humans don't look at that, they look at the generating code. What, specifically, do you find not meaningful? ~TJ
Received on Sunday, 24 January 2010 18:39:47 UTC