- From: Shelley Powers <shelley.just@gmail.com>
- Date: Mon, 25 Jan 2010 08:34:07 -0600
- To: Maciej Stachowiak <mjs@apple.com>
- Cc: Lars Gunther <gunther@keryx.se>, "public-html@w3.org WG" <public-html@w3.org>
- Message-ID: <643cc0271001250634i366c8d11p5af2df80b4c68d20@mail.gmail.com>
On Mon, Jan 25, 2010 at 4:16 AM, Maciej Stachowiak <mjs@apple.com> wrote: > > On Jan 25, 2010, at 2:03 AM, Lars Gunther wrote: > >> 2010-01-24 18:14, Tab Atkins Jr. skrev: >>> Indeed, there are nearly as many html-sanitizers as there are CMSes. >>> And they're pretty uniformly bad. Most of them are built on fragile >>> regexps, if you're lucky. They might just be a handful of string >>> replaces that address whatever problems the CMS author could think of >>> at the time. The best of them address *currently known attack >>> vectors* decently enough, but are usually weak to*new* attacks. >> >> There are white list approaches as well that one can use and indeed that are being used. I know of and have written a few myself. >> >> Using XHTML syntax and XML tools makes this stuff easier to implement, in the absence of a "full HTML parser/tokenizer"! >> >> I am unconvinced about the usefulness on MOVING security to the browser. First of all it can not be relied on, since we do not know for sure that all user agents implement it correctly. And it will take many years until 99 % of all agents support this and in the meantime we have to continue to do server side checks anyway. >> >> This thing could work if seen as an extra layer of security. Defence in depth is always a good thing! But if it is marketed as something you'll do INSTEAD of servers side checks, it will actually be harmful to security on the web. > > The goal for sandboxed <iframe> is to promote and deploy it for defense in depth. It is not intended to be used as the sole security mechanism, since it will take years until browsers that do not support it are gone. > >> Besides, you will probably want to stop a lot of other things as well, like target="_blank" and <div style="display: none">Lots of links I use for black hat SEO here</div> even if it is inside an iframe, sandboxed or not. > > Sandboxed iframes will help you with targeted links. Check out the "sandboxed navigation browsing context flag" here: < http://dev.w3.org/html5/spec/Overview.html#attr-iframe-sandbox>. It will not help you strip out "display: none" content, but search engines will be able to make their own judgment based on the fact that it is sandboxed. > >> Summary: If this technology is about "offloading" security to the browser, it will be harmful to web security! If it is about adding an extra layer, and will be marketed only as such, it is OK. > > My understanding is that the purposes is solely defense in depth, at least until it is widely enough deployed that it can be relied on. Even then, it's probably best to combine it with a whitelist filter. Certainly the security experts I've talked to would promote its use as an additional mechanism, and if anyone asked me for advice on deploying sandboxed iframes, I would tell them the same. > > (To clarify relation to the thread topic, this doesn't necessarily depend on doc/srcdoc, just on sandboxed iframes in general. As far as I'm aware, srcdoc is just there to make sandboxed iframes easier to use, it is not required to get the security benefits.) > > Regards, > Maciej > > > I'm glad you noted that this approach won't really be worthwhile, or stand alone, for years because of existing browsers. But let's go further on this. Since I found out that the primary use case for this change, and in fact, this whole sandbox issue, is comments, let's talk about comments. I've had a weblog from various vendors for close to a decade now. Started out with one of Dave Winer's defunct hosting systems, went to Blogger, went to Radio, back to Blogger, to Wordpress, to my own tool at one point, tried out ExpressionEngine, couple of others I can't even remember, and now, Drupal. And I've had comments, off and on, since my early Blogger days. So, I know comments. This is an approach to protect against things like, I believe people doing things like inserting JavaScript into their comments? Does it also protect against SQL injection, which has been the primary problem in the past? I'm not sure how throwing all of this on the browsers is going to protect against SQL injection,but I'll make an assumption that yes, it protects again SQL injection. So, let's talk about SQL injection let's talk about...viagra. Yes, viagra. The reason I want to talk about viagra is because the primary problem most people have had with things like comments in the last five years or so, is less about XSS or SQL injection, because we got most of this problem a while back. I mean, the people that provide the most popular comment protection systems live and breath this stuff, and are probably the most expert people in the world on comment security, so most of us aren't overly concerned about the script kiddies. Well, no more than most, since any tool that generates any comment is vulnerable to hackers -- including browsers. No the problems people have had in the last several years is spammers. Spammers, coming in with their links to viagra, or spammerisgood.com. So, will the browsers also protect us from spammers? I read the IRC, I get the impression from this "fix", that you all don't think highly of the tools we're using now, and that we should trust the browsers to provide these protections in the future. But the input protection tools not only protect us against inserted JS or SQL injections, they also protect us from spammers. When we turn over management of security of comments to the browser companies, will they/you also build in support against spammers? So, can we can stop using Akismet? We've talked about spammers and viagra, let's talk about something else. Such as whitelisted HTML entities. Right now, most of us only allow certain elements in our comments. We don't allow script, of course, but many of us don't like inserting img elements either. You remember goatse? Many of us stopped allowing img insertion when people started inserting in goatse images, or their like. Or images too big for the comment area, or inappropriate, or any number of things.So, when we turn our comments over to the browser companies, will they also provide the ability to whitelist which elements we'll allow or not? Now, you mentioned allowing SQL in HTML. Of course, I'm assuming you won't allow script in the HTML. Or script in the HTML that is added as a foreign object in the SVG. But you don't necessarily need script to cause problems with something like SVG. SVG has declarative animations. If they can't harm a site, they can cause problems. Flash enough color fast enough, and you'll throw some folk into seizures. In addition, I have an SVG file that can pretty much take any browser and computer system down to a slow crawl because it has so many complex paths in it. I have another, much simpler one that actually not only crashed the first version of Chrome when it came out, it crashed my operating system because Chrome handled the rendering so poorly. (No worries, Chrome fixed that particular problem quite quickly.) I'm a huge supporter of SVG, but I never allowed SVG elements in my comments. And my comment protection software allowed me to pick and choose who received what, all based on their user setting. Because my software, Drupal, allows me to create any number of user types, and then assign different protection levels accordingly. So, we can count on browsers providing this level of support? Speaking of which, I want to talk about Blogher, the site. That's http://blogher.com. Blogher is a Drupal site, where anyone can sign up for an account. Not only can you comment, but you can also write your own posts. Now, the posts won't show up on the front page, not until you get enough people following you. But you can, more or less, post on your page right from the start. Really helped Blogher to build community. Of course, this required another level of input control, but this time on the weblog posts, not the comments. But that's OK, because the same software that scrubs the input for comments can also be used with the input for weblog posts, even though the template that serves the posts is different than the template that serves comments. Again, we can set different levels of security on different levels of weblog editors, as well as commenters. Will the browsers provide this level of support? How will we handle templates, then, if the template that serves up weblog posts for trusted users is the same template that serves up posts for not-yet-trusted users? Does this mean that we have to convert all of the weblog posts to iframes with escaped X/HTML in attributes? Speaking of escaping...I serve my pages up as XHTML, that's with application/xhtml+xml. In fact, it's the default for all my pages at my sites. Yeah, it's a bit of work, and glitches come through, but it's a sure fire way of ensuring my pages are pretty clean. But it makes it difficult to serve comments some times--especially when you have readers like Sam Ruby and Jacques Distler. As soon as I would open comments and proclaim them safe, typically one or the other would come along and type in a character such as U+FFFE. This is OK, if you're serving your pages up as HTML, but will cause the YSOD if you serve your pages up as XHTML. This was particularly tricky, too, because for a time, CMS weren't particularly concerned about protecting your comments in an XHTML environment. When I switched to Drupal, though, the issue became less of a problem because Drupal folks seem to have a stronger interest in pages being proper XHTML. They may serve the pages as HTML, but they still want them to be proper XHTML. I had a choice of several modules to use to protect my comments--not only against script kiddies and SQL injection, spammers, and elements that can be abused, but also against these non-printing characters that play havoc in XHTML pages. I use htmLawed, and the one time stuff did get through that threw up a YSOD,the actual creator of the module contacted me to go over the particulars so he could ensure this wouldn't happen again. The folks that provide these plug-ins and modules, you may not like the code, but this is their heart and soul -- this is all they do, is discover ways that comments and such can be broken and to work around the problems. Will the browser companies be as responsive? So I have to escape my markup in the attribute because my pages are XHTML, but will the browsers take care of the characters like U+FFFE? They haven't before this time. Seems to me that Firefox is happy to throw up a YSOD when it encounters this type of character. I don't have Firefox developers contacting me, asking how they can prevent such from happening. The point I'm trying to laboriously make is that by the time I have to apply all of the filters that I'm pretty sure the browser companies won't be concerned about, the point of using something like an iframe, and escaped markup in an attribute not only won't add any additional security, it may just open up new security problems we're not aware of, because we're now changing security handling from these experienced tools, to less experienced browser developers. Yes, less experienced. Browser developers have a different focus then security for comments. There are other areas of security these developers have to focus on, such as the security in the browsers own code -- such as what was used with the recent China/Google events. But the security of the page content has always been our, we the authors, the tool makers, the CMS developers, concern. And we've managed over the years. We've learned. And though you may not like the code for htmLawed, the thing works in all of the ways we need it to work. So who is the customer for this change? I was surprised when I saw that the use case for this change was primarily weblog comments. My first reason was, who asked for this? I don't know about others who use tools such as Drupal, but I'm certainly not going to code my templates to use iframes with escaped attribute text, rather than elements. I haven't a clue what this will do to search engines, much less the other bots that crawl my site, since I also serve up RDFa, and there are bots only interested in that (which they probably won't discover since everything is escaped attribute text now). As for trusting the browser companies, well no offense, but you all do put out a lot of security releases. And I'm glad, but I would think that the browser companies have enough to worry about with their own code, much less now having to ensure the security of my page contents. Naturally, if I can't convince you to remove this functionality from the spec, I will file a bug. But in the meantime, I really would like to know how you see all this working in today's systems, such like the use cases I just mentioned? Since the point seems to be replacing tools like Akismet and htmLawed. And I really would like to know: who is the customer for this change? Shelley (Long email, probably many typos, sorry. Speaking of which, this is being typed into GMail -- will GMail use iframes with escaped content in attributes?)
Received on Monday, 25 January 2010 14:34:46 UTC