- From: Shelley Powers <shelley.just@gmail.com>
- Date: Sun, 24 Jan 2010 13:31:57 -0600
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- Cc: Ian Hickson <ian@hixie.ch>, "public-html@w3.org WG" <public-html@w3.org>
On Sun, Jan 24, 2010 at 12:38 PM, Tab Atkins Jr. <jackalmage@gmail.com> wrote: > On Sun, Jan 24, 2010 at 12:04 PM, Shelley Powers <shelley.just@gmail.com> wrote: >> On Sun, Jan 24, 2010 at 11:14 AM, Tab Atkins Jr. <jackalmage@gmail.com> wrote: >>> On Sun, Jan 24, 2010 at 10:55 AM, Shelley Powers <shelley.just@gmail.com> wrote: >>>> This is an old issue. We have had software to sanitize comments for a >>>> long time. It's built into most CMS tools. And for those who disregard >>>> the use of such tools, they're not going to use this, either. >>> >>> Indeed, there are nearly as many html-sanitizers as there are CMSes. >>> And they're pretty uniformly bad. Most of them are built on fragile >>> regexps, if you're lucky. They might just be a handful of string >>> replaces that address whatever problems the CMS author could think of >>> at the time. The best of them address *currently known attack >>> vectors* decently enough, but are usually weak to *new* attacks. >>> >> >> Most are not bad, many are good, a few are exceptional. I don't >> believe either Drupal or Wordpress are vulnerable to script attacks in >> comments. Do you have a demonstration how script attacks would >> circumvent the protections in place in these CMS? When they're using, >> oh, something like htmLawed? > > htmLawed is large and regexp-based. HTML is not a regular language, > and is too complex to be addressed by regexps in the wild (which > recognize more than just regular languages). > > There is, indeed, certainly a flaw or lack in its sanitation > somewhere. There will always be unless you use a parser/tokenizer > that works the same way as the one that browsers use*. @srcdoc just > offloads the job to the browser itself, so there's no chance of a > mismatch. > > *If your algorithm doesn't work the same way as the browsers', then > you can pass something through as harmless that the browser interprets > differently and permits an attack. For a trivial example of this in a > similar domain, look at the history of PHP's mysql_escape_string() and > mysql_real_escape_string() functions. The former didn't escape > certain UTF-7 sequences that mySQL translated to 'normal' characters, > allowing SQL injection to still occur. An escaping function must know > *exactly* how the target is going to process the code, or else it > risks precisely this sort of error. > >> Are you saying that this is the rationale for this change? > > The rationale for @srcdoc is to be able to use the benefits of > @sandbox without incurring network requests (by linking to content as > a normal iframe). The benefits of @sandbox have been discussed > previously. The specific proposal for @srcdoc has it's own benefits > over similar proposals that address the same issue; namely, it has > only two trivial escaping requirements, only one of which is relevant > for security. As well, the security-relevant escape requirement > should fail very quickly and visibly if it's left out (as any > innocuous " in the content will cause the rest of the content to > drop), as opposed to many of the other proposals which fail much more > quietly, perhaps only when an actual attack takes place. > >> If so, do you have specific examples of these commonly occurring >> vulnerabilities in existing santizer technologies? You have specific >> ways to circumvent the sanitizers? > > Look into the history of nearly every XSS attack that has ever been > created. Look at the changelogs of any specific widely-used > sanitizers, if they use a publicly-available source control. > >>> On the other hand, @srcdoc makes this whole thing trivial, and allows >>> us to leverage the behavioral restraints of @sandbox as well. It's a >>> win for everyone. The only loss is if you were somehow silly enough >>> to write code with @srcdoc by hand, and I've already explained why >>> that's a silly thing to do. >>> >> >> But people have to write the templates by hand. At some point in time, >> humans are involved in web pages. Whether they write the code to >> generate the content, design the templates to use the code, or yes, >> even create the web page by hand--humans are involved. > > Indeed, the templates are written by hand. I just did so a message or > two back to demonstrate precisely what is expected to be written. I > also demonstrated that a template using @srcdoc is nearly identical to > a template not using it, except that the former gets to benefit from > the @sandbox security model. > > If the entire page is written by hand, *then there's no need for > @srcdoc*. You are writing the code yourself, so you don't have to > protect yourself from XSS attacks written by yourself to steal > information from yourself. You just write the code. > > So, I'm not sure what exactly you're objecting to. > >> Ultimately, this stuff has to be meaningful for humans in order to >> work. This change, is not meaningful. > > The templates, and how to produce them, is completely meaningful. > What part of the code I demonstrated is confusing? The generated code > may be ugly, due to the extra escapes and long content in attributes, > but the generated code is only relevant for browsers. Humans don't > look at that, they look at the generating code. > > What, specifically, do you find not meaningful? > > ~TJ > Thanks for the rationale. Shelley
Received on Sunday, 24 January 2010 19:32:32 UTC