[whatwg] The problem of duplicate ID as a security issue from Mihai Sucan on 2006-03-13 (public-whatwg-archive@w3.org from March 2006)

From: Mihai Sucan <mihai.sucan@gmail.com>
Date: Mon, 13 Mar 2006 22:09:27 +0200
Message-ID: <op.s6dbl1k9mcpsjg@localhost.localdomain>
Le Mon, 13 Mar 2006 10:16:55 +0200, Alexey Feldgendler  
<alexey at feldgendler.ru> a ?crit:

> On Fri, 10 Mar 2006 17:49:17 +0600, Mihai Sucan <mihai.sucan at gmail.com>  
> wrote:
>
<...>
>
> No, it's not really a change in getElementBy* functions. Because there  
> have been no sandboxes before HTML 5, noone can really expect that these  
> functions treat sandbox elements the same as all other elements. Well,  
> sandboxes are "security barriers" by their nature, so it seems, at least  
> to me, quite natural to have getElementBy* functions stop at them.

Yes... but there's a need for allowing the parent document control  
sandboxed content. Therefore, it needs a new parameter, for example:  
getElementById(string id, bool search_in_sandbox). Isn't that changing the  
getElementById function? Of course this only a way, it could probably be  
done differently, without changing the function(s).

> It's not to force them, it's to help them. Sanitizing user-supplied HTML  
> is a very difficult task today, and new security holes in HTML cleaners  
> of many web applications are found again and again. I think that the  
> spec should make it easier to write a secure web application.

Yes, this is true.

>> Why do so? Authors already have to take care of not allowing some tags  
>> and other tricks in the book (for example <meta refresh>). If the  
>> author allows users to supply *any* tag (even the innocent <strong>),  
>> then they already expose their app to potential security holes.
>
> Yes, I know, and I think it's wrong. The spec should make <strong>  
> harmless, at least inside a sandbox.

How can it do so? Disallowing IDs, class names, ...? Or by changing the  
way getElement(s)By* work?

> CSS has properties that can be used to fit user-supplied content into a  
> box and make it sit there quietly ("overflow: hidden" etc). The user can  
> make whatever mess he wants of his own blog entry or whatever but it  
> won't harm the rest of the page.

I'm not sure this works in all cases. I haven't tested because I've never  
been in the position of allowing such user-supplied content in pages and  
"sandboxing" the user-styled content.

>> The spec can't do much in these situations. Shall the spec provide a  
>> way for CSS files to *not* be applied in <sandbox>ed content?
>
> CSS3 already has negation selectors that can be used for this:
>
> *:not(sandbox) p { text-align: left; }
>
> This makes all paragraphs left-aligned except in sandboxes.

Yes, very interesting. I was aware of this, but I forgot of it.

This would be better used coupled with a suggestion made in a thread  
"styling the unstylable" (on www-style): style-blocks.

>> Generally authors just don't allows users to input HTML code at all (I  
>> myself do that). It's the safest way and the easiest way.
>
> Well, of course plain text is the safest. But many applications require  
> formatting markup in user-supplied text. Some applications don't try to  
> deal with the security pitfalls of HTML and invent their own markup  
> syntax (e.g. BBcode). However, there are two things wrong about these:

Many applications... the only one I can currently think of ... are WYSIWYG  
editors, discussion forums and all those sites which provide user-comments  
(blogs, image galleries, etc).

Most of all these applications, if not all, could allow the HTML  
counter-parts (instead of inventing BBcode, or some other custom markup),  
but removing all attributes except those allowed (white list, not a black  
list of attributes). I'd say it would be easier to implement, given the  
fact server-side technologies provide HTML and XML parsers, hence the  
manipulation of "user documents" would be easier and faster too (the  
parsers are usually much faster than unoptimized regular expression  
matching, string parsing, ... coded by "average" web authors). Removal of  
unallowed tags and attributes is trivial.

Also, the aforementioned applications are not currently required to allow  
user-supplied tags to contain IDs, class names, scripting and/or styling.

I know you are now thinking of WYSIWYG editors ("they must allow users to  
style their documents"). True. These web applications must also provide  
"WYSIWYG" editing capabilities for CSS, they can't expect average Jane and  
Joe to know CSS. Therefore, the list of class names is already known to  
the WYSIWYG editor, and can easily check the class= attribute to allow  
*just* the some class names (using the aforementioned parsers and  
server/client-side DOM manipulation authors can easily limit the list of  
class names allowed). All the same goes for IDs.

As for scripting, if there's any user wanting to post his/her script in a  
forum, then that's a problem. I wouldn't ever allow it (except probably  
for research purposes, such as "how users act when they are given all  
power" :) ).

> 1. We already have a great markup language, which is HTML, and there are  
> many tools and libraries available that deal with it.

Exactly the point I've now made above: this can be done *today*. Authors  
just don't of it and go the hard way, inventing new security holes in  
their own custom markup.

> 2. The WA1 spec defines facilities designed for WYSIWYG editing which  
> encourage the use of HTML as the markup language for user-supplied  
> content.

I find this a good thing.

<...>
> I've mentioned it in the original message. Though I find it too strict  
> to strip all id and class attributes from user-supplied text. They  
> usually do more good than bad.

I don't. It's not too strict at all. I actually find it very loose to  
allow these specific attributes. They should be allowed *only* when there  
are real requirements (especially IDs).

>> As Mikko said "allowing random user input with possibility to use user  
>> supplied scripting is next to impossible to make secure".
>
> That's what I'm trying to do, and I'm not yet convinced that it's  
> impossible. This is a hard task but I believe it's what the web needs.

Yes, this is good. Web-based viruses don't yet exist, but it's only a  
matter of time.

Do you any other ideas how to do so? In regards to the duplicate IDs issue.

Nothing is impossible, it's just *not yet* possible with current knowledge.

> BTW, my original message shows an exploit which is possible even if the  
> HTML cleaner doesn't allow scripts.

Yes, true. I wasn't even talking about allowing user-supplied scripts.  
That's not even on the horizion of average Jane and Joe, in any of their  
wildest dreams about cutting-edge WYSIWYG editors :) - or, at least, that  
shouldn't be.

In regards to online WYSIWYG editors, I think we can classify them into  
three main categories (by capabilities):

- grade 1
Easy to use, easy to make ones: for blog comments, image gallery comments,  
even forums.

Scripting: none
Styling: none
Tags: p, strong, em, h1-h6, ol, ul, dl, li, dd, dt, ... (and similar)
Attributes: whatever is "innocent", except IDs and anything the authors  
consider problematic, including, but not limited to: class and style.

- grade 2
Full-blown ones: for blog articles, CMSs, ...

Scripting: none
Styling: yes
Tags and attributes: same as grade 2, with the exception that these must  
allow class and style attributes.

- grade 3
Web authoring tools: similar to NVU, Dreamweaver, ...

Scripts, styling, tags and attributes: everything.

Serious/powerful online web authoring tools do not currently exist (only  
shy "site builders").

Security concerns regarding scripting are eliminated in grade 1 and grade  
2 WYSIWYG editors, because you can't really expect average Jane and Joe to  
want to do so scripting for their articles and pages in CMSs. If they'd  
want, they'd make their own site "by hand".

Grade 3 is another "story". These web apps must allow everything, they  
can't enforce much: everybody makes their own site exactly the way they  
want it.

P.S. You have sent the reply only to me. I suppose it's by mistake  
(nothing personal was in it). I have sent my reply to your email back to  
WHATWG (I expect your future replies to also do so - it's a public  
discussion).

-- 
http://www.robodesign.ro
ROBO Design - We bring you the future
Received on Monday, 13 March 2006 12:09:27 UTC