- From: Mihai Sucan <mihai.sucan@gmail.com>
- Date: Mon, 13 Mar 2006 22:09:27 +0200
Le Mon, 13 Mar 2006 10:16:55 +0200, Alexey Feldgendler <alexey at feldgendler.ru> a ?crit: > On Fri, 10 Mar 2006 17:49:17 +0600, Mihai Sucan <mihai.sucan at gmail.com> > wrote: > <...> > > No, it's not really a change in getElementBy* functions. Because there > have been no sandboxes before HTML 5, noone can really expect that these > functions treat sandbox elements the same as all other elements. Well, > sandboxes are "security barriers" by their nature, so it seems, at least > to me, quite natural to have getElementBy* functions stop at them. Yes... but there's a need for allowing the parent document control sandboxed content. Therefore, it needs a new parameter, for example: getElementById(string id, bool search_in_sandbox). Isn't that changing the getElementById function? Of course this only a way, it could probably be done differently, without changing the function(s). > It's not to force them, it's to help them. Sanitizing user-supplied HTML > is a very difficult task today, and new security holes in HTML cleaners > of many web applications are found again and again. I think that the > spec should make it easier to write a secure web application. Yes, this is true. >> Why do so? Authors already have to take care of not allowing some tags >> and other tricks in the book (for example <meta refresh>). If the >> author allows users to supply *any* tag (even the innocent <strong>), >> then they already expose their app to potential security holes. > > Yes, I know, and I think it's wrong. The spec should make <strong> > harmless, at least inside a sandbox. How can it do so? Disallowing IDs, class names, ...? Or by changing the way getElement(s)By* work? > CSS has properties that can be used to fit user-supplied content into a > box and make it sit there quietly ("overflow: hidden" etc). The user can > make whatever mess he wants of his own blog entry or whatever but it > won't harm the rest of the page. I'm not sure this works in all cases. I haven't tested because I've never been in the position of allowing such user-supplied content in pages and "sandboxing" the user-styled content. >> The spec can't do much in these situations. Shall the spec provide a >> way for CSS files to *not* be applied in <sandbox>ed content? > > CSS3 already has negation selectors that can be used for this: > > *:not(sandbox) p { text-align: left; } > > This makes all paragraphs left-aligned except in sandboxes. Yes, very interesting. I was aware of this, but I forgot of it. This would be better used coupled with a suggestion made in a thread "styling the unstylable" (on www-style): style-blocks. >> Generally authors just don't allows users to input HTML code at all (I >> myself do that). It's the safest way and the easiest way. > > Well, of course plain text is the safest. But many applications require > formatting markup in user-supplied text. Some applications don't try to > deal with the security pitfalls of HTML and invent their own markup > syntax (e.g. BBcode). However, there are two things wrong about these: Many applications... the only one I can currently think of ... are WYSIWYG editors, discussion forums and all those sites which provide user-comments (blogs, image galleries, etc). Most of all these applications, if not all, could allow the HTML counter-parts (instead of inventing BBcode, or some other custom markup), but removing all attributes except those allowed (white list, not a black list of attributes). I'd say it would be easier to implement, given the fact server-side technologies provide HTML and XML parsers, hence the manipulation of "user documents" would be easier and faster too (the parsers are usually much faster than unoptimized regular expression matching, string parsing, ... coded by "average" web authors). Removal of unallowed tags and attributes is trivial. Also, the aforementioned applications are not currently required to allow user-supplied tags to contain IDs, class names, scripting and/or styling. I know you are now thinking of WYSIWYG editors ("they must allow users to style their documents"). True. These web applications must also provide "WYSIWYG" editing capabilities for CSS, they can't expect average Jane and Joe to know CSS. Therefore, the list of class names is already known to the WYSIWYG editor, and can easily check the class= attribute to allow *just* the some class names (using the aforementioned parsers and server/client-side DOM manipulation authors can easily limit the list of class names allowed). All the same goes for IDs. As for scripting, if there's any user wanting to post his/her script in a forum, then that's a problem. I wouldn't ever allow it (except probably for research purposes, such as "how users act when they are given all power" :) ). > 1. We already have a great markup language, which is HTML, and there are > many tools and libraries available that deal with it. Exactly the point I've now made above: this can be done *today*. Authors just don't of it and go the hard way, inventing new security holes in their own custom markup. > 2. The WA1 spec defines facilities designed for WYSIWYG editing which > encourage the use of HTML as the markup language for user-supplied > content. I find this a good thing. <...> > I've mentioned it in the original message. Though I find it too strict > to strip all id and class attributes from user-supplied text. They > usually do more good than bad. I don't. It's not too strict at all. I actually find it very loose to allow these specific attributes. They should be allowed *only* when there are real requirements (especially IDs). >> As Mikko said "allowing random user input with possibility to use user >> supplied scripting is next to impossible to make secure". > > That's what I'm trying to do, and I'm not yet convinced that it's > impossible. This is a hard task but I believe it's what the web needs. Yes, this is good. Web-based viruses don't yet exist, but it's only a matter of time. Do you any other ideas how to do so? In regards to the duplicate IDs issue. Nothing is impossible, it's just *not yet* possible with current knowledge. > BTW, my original message shows an exploit which is possible even if the > HTML cleaner doesn't allow scripts. Yes, true. I wasn't even talking about allowing user-supplied scripts. That's not even on the horizion of average Jane and Joe, in any of their wildest dreams about cutting-edge WYSIWYG editors :) - or, at least, that shouldn't be. In regards to online WYSIWYG editors, I think we can classify them into three main categories (by capabilities): - grade 1 Easy to use, easy to make ones: for blog comments, image gallery comments, even forums. Scripting: none Styling: none Tags: p, strong, em, h1-h6, ol, ul, dl, li, dd, dt, ... (and similar) Attributes: whatever is "innocent", except IDs and anything the authors consider problematic, including, but not limited to: class and style. - grade 2 Full-blown ones: for blog articles, CMSs, ... Scripting: none Styling: yes Tags and attributes: same as grade 2, with the exception that these must allow class and style attributes. - grade 3 Web authoring tools: similar to NVU, Dreamweaver, ... Scripts, styling, tags and attributes: everything. Serious/powerful online web authoring tools do not currently exist (only shy "site builders"). Security concerns regarding scripting are eliminated in grade 1 and grade 2 WYSIWYG editors, because you can't really expect average Jane and Joe to want to do so scripting for their articles and pages in CMSs. If they'd want, they'd make their own site "by hand". Grade 3 is another "story". These web apps must allow everything, they can't enforce much: everybody makes their own site exactly the way they want it. P.S. You have sent the reply only to me. I suppose it's by mistake (nothing personal was in it). I have sent my reply to your email back to WHATWG (I expect your future replies to also do so - it's a public discussion). -- http://www.robodesign.ro ROBO Design - We bring you the future
Received on Monday, 13 March 2006 12:09:27 UTC