[whatwg] The problem of duplicate ID as a security issue from Mihai Sucan on 2006-03-14 (public-whatwg-archive@w3.org from March 2006)

From: Mihai Sucan <mihai.sucan@gmail.com>
Date: Tue, 14 Mar 2006 22:36:51 +0200
Message-ID: <op.s6e7jptdmcpsjg@localhost.localdomain>
Le Tue, 14 Mar 2006 14:03:42 +0200, Alexey Feldgendler  
<alexey at feldgendler.ru> a ?crit:

> To access the nodes inside sandboxes, the script in the parent document  
> can eithher "manually" traverse the DOM tree or do the following: first  
> find all relevant elements in the main document (starting from the root  
> noode), then find all sandboxes with getElementsByTagName() (which  
> doesn't dive inside sandboxes, but is able to return the sandboxes  
> themselves), then continue recursively from each sandbox found. This  
> involves somewhat more coding work, but I expect that finding all  
> mathing elements across sandbox boundaries will be a significantly more  
> unusual task than finding elements in the parent document (outside  
> sandboxes) or within a given sandbox.

Yes, I saw Ric's reply. A nice suggestion, but that implies <sandbox> is a  
documentElement by itself, or is it a DOMSandbox needing to be defined?

<...>
> I hope that defining getElement(s)By* to not cross sandbox boundaries  
> will do the work.

Yes.

<...>
> Anyway, even if there are cases when "sandbox {overflow: hidden}" is not  
> enough, the possible extent of damage from misplaced content that  
> visually "jumps" out of the sandbox is a whole order less than the  
> extent of damage  from the exploit shown in my original message. It's  
> more important to handle the latter.
>
> A side note: it may help to specify a set of default styling rules for  
> the sandbox element so that it doesn't allow visual leakage of content.

I agree.

>>>> The spec can't do much in these situations. Shall the spec provide a  
>>>> way for CSS files to *not* be applied in <sandbox>ed content?
>
>>> *:not(sandbox) p { text-align: left; }
>
>> Yes, very interesting. I was aware of this, but I forgot of it.
>>
>> This would be better used coupled with a suggestion made in a thread  
>> "styling the unstylable" (on www-style): style-blocks.
>
> Sorry, I must have completely missed that thread... Can you give me the  
> link?

http://lists.w3.org/Archives/Public/www-style/2006Mar/thread.html
http://lists.w3.org/Archives/Public/www-style/2006Mar/0035.html

See my last reply. Theoretically it's not even remotely related to this  
thread, but if you think of it: style-blocks would help with sandboxing  
user styling (I can explain it to you on ICQ or private emails).

<...>
> Wikis are a somewhat outstanding example. These traditionally use custom  
> markup languages (mainly to make hyperlinking easier), but many of them,  
> like MediaWiki, allow a subset of HTML as well. (MediaWiki uses the  
> "whitelist" approach, but it seems to be at least theoretically  
> vulnerable to the duplicate ID trick.)

Very good point.

<...>
> This is true, but there is a problem with the whitelisting approach: the  
> set of elements and attributes isn't in one-to-one correspondence with  
> the set of broowser features. For example, one can't define a set of  
> elements and attributes which must be removed to prohibit scripting:  
> it's not enough to just remove <script> elements and on* attributes, one  
> must also check attributes which contain URIs to filter out  
> "javascript:". (I know it's a bad example because one would need to  
> convert javscript: to safe-javascript: anyway, but you've got the idea,  
> right?)
>
> While filtering the DOM tree by the HTML cleaner is easy, it approaches  
> the problem from the syntax point of view, not semantic. It's more  
> robust to write something like <sandbox scripting="disallow"> to  
> disallow all scripting within the sandbox, including any obscure or  
> future flavors of scripts as well as those enabled by proprietary  
> extensions (like MSIE's "expression()" in CSS). Browser developers know  
> better what makes "all possible kinds of scripts" than the web  
> application developers.
>
> Likewise, other browser features are better controlled explicitly ("I  
> want to disable all external content within this sandbox") than by  
> filtering the DOM tree. At least because not all new features, like new  
> ways to load external conteent, come with new elements or attributes  
> which aren't on the whitelist. Some features reuse existing syntax in  
> elegant ways.

Again, good point, but this is not entirely related to "duplicate ID as a  
security issue". Meaning, you are advocating for the <sandbox> element.  
That's something I also do, depending the way it's going to be defined (of  
course).

The <sandbox> element would make securing a web application from common  
security holes and other pitfalls much easier and elegant. Of course, it  
would also solve the duplicate IDs issue.

> IDs are useful to make anchors for navigation to sections of the page,  
> and classs names are useful to style the content in uniformity with the  
> rest of the site (for example, Wikipedia's skins define the class  
> "wikitable" to make user tables look the same throughout the site).  
> These two features are good for the web. Taking them away for security  
> reasons would lower the quality of the web content. For example, if  
> Wikipedia disallowed the class attribute, then each such table would  
> have to bear physical formatting attached to it, which is a step behind.
>
> Of course, comments on forums don't need these features. But I'm talking  
> more of your "grade 2" applications.

Wikipedia is a special case I forgot of. It's very close to "grade 3", but  
not quite.

<...>
>> As for scripting, if there's any user wanting to post his/her script in  
>> a forum, then that's a problem. I wouldn't ever allow it (except  
>> probably for research purposes, such as "how users act when they are  
>> given all power" :) ).
>
> Scripting isn't useful for forum posts, but it is useful in  
> blogs/CMS/wikis, mainly because today's HTML sucks. People want things  
> like collapsible sections, popup menus, tables with changeable sort  
> order etc. (Some of these tasks won't require scripting according to  
> WA1).

I have to somewhat disagree with this, because blogs, CMS and wiki  
applications must provide the scripts, the "toys" in a WYSIWYG  
environment. Those can be secured by the application authors in a proper  
way, and user-scripts should be not allowed. Table sorting, popup menus  
and similar are all toys. Does Wikipedia allow full-scripting access? I  
believe they allow access to some toys only.

>>> I've mentioned it in the original message. Though I find it too strict  
>>> to strip all id and class attributes from user-supplied text. They  
>>> usually do more good than bad.
>
>> I don't. It's not too strict at all. I actually find it very loose to  
>> allow these specific attributes. They should be allowed *only* when  
>> there are real requirements (especially IDs).
>
> Navigational anchors is a real use case for IDs.
>
> Classes have many use cases, the primary being to avoid presentational  
> in favor of semantic formatting. Another harmless but useful way to  
> apply classes is the so-called microformats (see  
> http://microformats.org/).

True.

<...>
>> Yes, this is good. Web-based viruses don't yet exist, but it's only a  
>> matter of time.
>
> Java applets exist for many years, but there aren't any viruses  
> distributed this way. The framework for the Java applets is so  
> well-defined that it's just not possible.

I'd say it's just like viruses for Linux. Not many want to do a virus for  
Linux, they all make viruses for Windows. If we'd all switch to Linux,  
we'd have many viruses for Linux too (it's not impossible as you should  
already know). All the same goes for the web, the java applets, etc. But  
this is off-topic and it's a very different story.

> Returning to the duplicate IDs, I think we should define some standard  
> behavior for getElementById() when there is more than one element with  
> the given ID. To lower the possible extent of duplicate ID attacks, I  
> propose that getElementById() should throw an exception in that case.  
> It's better to crash the script than to make it do what the attacker  
> wants.

Bad idea. I've just worked with a guy on a web application done the  
"industrial way" (as in "get it done ASAP, no matter how"). This was done  
entirely with copy/pasted frameworks with Java on the server-side, DOJO  
client-side and some more frameworks (5 to 10!!!). It was horrible: many  
duplicate IDs, slowly loading ("web 2.0 ready with AJAX"), etc. I was  
amazed it even worked :). The guy wasn't fully aware of the "behind the  
scenes" (he didn't even see how badly the generated DOM looks in the  
browser).

Point is, web applications currently do rely on duplicate IDs support.  
Throwing errors (thus breaking scripts) also badly breaks backwards  
compatibility. That web application is not the only one having such badly  
coded backend, it's one of many (look at most corporate web sites done in  
"a snap" by "gurus").

<...>
>> - grade 2
>> Full-blown ones: for blog articles, CMSs, ...
>>
>> Scripting: none
>> Styling: yes
>> Tags and attributes: same as grade 2, with the exception that these  
>> must allow class and style attributes.
>
> For these applications, user-supplied JavaScript is highly demanded, and  
> it can't be fulfilled by a limited set of predefined JavaScript toys.
>
> They also need IDs for navigational purposes.

Predefined toys are enough. It's almost useless to allow scripts to run in  
a sandboxed "frame-like" environment: in your blog article, without being  
able to interact with the page navigation (which is outside the sandbox),  
and do other stuff.

>> - grade 3
>> Web authoring tools: similar to NVU, Dreamweaver, ...
>>
>> Scripts, styling, tags and attributes: everything.
>>
>> Security concerns regarding scripting are eliminated in grade 1 and  
>> grade 2 WYSIWYG editors, because you can't really expect average Jane  
>> and Joe to want to do so scripting for their articles and pages in  
>> CMSs. If they'd want, they'd make their own site "by hand".
>
> They probably don't want to do "scripting", they just want these  
> interactive things like tables with changeable sort order. If they were  
> given the ability to use scripts in their articles, they would find a  
> nice JavaScript through a search engine and paste it on the site.

I disagree. Online web authoring tools must allow full scripting support,  
exactly as NVU does. As for grade 1 and 2, I said it above: just toys are  
enough (like table sorting, menus, etc).

>> P.S. You have sent the reply only to me. I suppose it's by mistake  
>> (nothing personal was in it). I have sent my reply to your email back  
>> to WHATWG (I expect your future replies to also do so - it's a public  
>> discussion).
>
> You're right, I've hit the wrong button. Thanks.

No problem, it already happend to me twice on these mailing lists :(.


-- 
http://www.robodesign.ro
ROBO Design - We bring you the future
Received on Tuesday, 14 March 2006 12:36:51 UTC