- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Tue, 13 Apr 2010 08:33:24 -0700
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: public-html@w3.org
On Tue, Apr 13, 2010 at 1:53 AM, Julian Reschke <julian.reschke@gmx.de> wrote: > Does that mean that you support the general direction of the change proposal > submitted for issue 103? I don't see anything particularly wrong with it. I don't have the expertise to know how correct it is. I'd like to see a breakdown of which of the additional characters are necessary to escape for security reasons, and which are necessary to escape just to maintain the content of the included page, though. > Anyway, what should be mentioned in this context is that putting markup into > attribute values is maybe the #1 anti-pattern in vocabulary design. It > should be avoided. Really. To add something like this for the first time in > HTML history just to add a feature where there's not even agreement that the > feature is useful is very very strange. I agree. In general, markup in attributes *is* a very bad idea. It's the least bad solution to this problem, though. Anything involving putting the content into the page normally involves either (a) much more complex escaping requirements (you need to count the occurrences of <iframe> and </iframe> in the content, and escape any unbalanced </iframe>s) or (b) the author successfully managing crypto-type things (the idea of putting in sufficiently unique and random tokens to delimit the start and end). Attributes happen to have the most minor escaping requirements of all. As well, this is *never* intended to be hand-written. If you're hand-writing something, you implicitly trust it, and can just write it into the page. This feature is designed *exclusively* to be machine-generated from stored user-generated content and similar. I feel it's very important to note, though, that data: url suffers from the exact same "markup in attributes" problem, just with slightly worse technical requirements. Anyone who objects to @srcdoc because it is "markup in attributes" but then suggests using data: urls would be confused, at best. > I think that's a very weak argument. I'm pretty sure that any feature we add > can be used wrongly. What's much more important is that the languages allow > to use it right, which appears to be the case. That being said, it even > wouldn't be a problem if new libraries would be needed - after all we're > talking about something which can be deployed only very slowly. True. The main thrust of the problem, though, is that the number of security-important escaping requirements for data: urls is higher than for @srcdoc, and aren't as easy to discover when you'd done them wrong, unlike @srcdoc's single security-important escaping requirement, which will fail very quickly and obviously in ordinary content. >> addition, despite both of these functions existing in PHP, multiple >> homebrew url-escaping functions can be found across the web, which may >> not escape everything that is necessary to escape. Some of these >> lapses may result in non-obvious security holes that can be exploited >> by attackers, allowing arbitrary code injection into a web page. >> ... > > And these kind of security holes are impossible with @srcdoc? Impossible? No, of course not. Much less likely, and much more likely to be discovered by the page author when they do exist? Yes. >> 2. In legacy browsers, data: urls will "fail open"; that is, they will >> display their contents even if the browser does not understand the >> sandbox security model, potentially exposing users to attack. This >> can be mitigated by specifying a text/html-sandboxed mime type in the >> data: url, however. > > So it's no problem, right? Assuming that browsers understand the sandbox security model and properly parse a text/html-sandboxed document, yes, it's not a direct problem. There is the minor issue of fallback, in that the failure mode for text/html-sandboxed is simply to not work at all. Since data: urls will be specified in @src, there's nothing else the <iframe> can possibly display, not even a notice to the user that their browser doesn't have the correct features to safely view this content. @srcdoc can fall back to a text/html-sandboxed version of its content, or it can fall back to a different @src document that explains the problem. >> ### @srcdoc is unneeded by the blogging community ### >> >> The creator of Wordpress, Matt Mullenweg, was asked about the need for >> @srcdoc in the Wordpress software. He responded that Wordpress >> maintains a sanitation library that appears to work adequately. >> >> This is, again, not an argument against @srcdoc, it is an argument >> against the sandbox security model. > > Well, it is an argument against both. No, it's still not. If the sandbox security model isn't needed, then of course @srcdoc isn't needed, since its sole reason for existing is to enable authors to shove user-generated content into the sandbox without incurring extra network requests. But problems with the sandbox security model do *not* belong in a proposal to remove @srcdoc only. They are completely irrelevant to the question of keeping @srcdoc itself, and serve only to confuse the matter and make it appear as though there are more problems with @srcdoc than there truly are. > As far as I can tell, the opposition to @srcdoc in specific is stronger than > iframe sandboxing in general due to the VERY controversial syntax. I understand the basic rationale behind that. That still doesn't make it relevant to attack the sandbox security model in a change proposal that has nothing to do with the sandbox security model. > If we have indication that one of the important uses cases that were > nominated won't be a use case in practice then we should think about both > @srcdoc and the sandboxing feature, true. The outcome might be that we want > to get rid of the questionable syntax, but keep sandboxing. > > However, this issue is just about @srcdoc, so let's focus on this. Another possible outcome is that we improve the sandbox security model to make it meet the use-case. Note, though, that I specifically argued that Mullenweg's comment is not necessarily indicative of the use-case being invalid in general. Mullenweg represents a large organization with significant knowledge and resources to combat attacks as they arise. This doesn't help authors using Wordpress that don't update their systems, though. It doesn't help authors who aren't using Wordpress at all. It doesn't help authors who potentially have the knowledge to extract Wordpress's sanitation library, but are using something other than PHP on their server. Every separate sanitation library that exists is a new and interesting source of bugs to exploit. Solving a large portion of this problem once, in the browser, can end up paying us back great dividends in terms of website security. >> ... >> Negative >> : As with all new elements and attributes, implementing this requires >> effort from implementors. >> ... > > Lots of effort. Essentially it requires re-parsing attribute content with an > HTML5 parser. This is an architectural problem; for instance, an application > may be consuming SAX events from an HTML5 parser, but may not have direct > access to the parser engine at all. Wouldn't a problem of this nature be shared by *any* content pointed to by an <iframe>? >From the chatter I've seen about implementing @srcdoc, actually dealing with it isn't a huge issue. Depending on the underlying browser architecture, they may have to fake up a mock network resource with the @srcdoc contents and then just pass that to whatever machinery normally handles <iframe>s, or something like that. Additional effort, sure, but nothing horrifying. I haven't heard any browser vendor directly object to @srcdoc due to implementation difficulty. ~TJ
Received on Tuesday, 13 April 2010 15:34:23 UTC