[whatwg] The problem of duplicate ID as a security issue from Alexey Feldgendler on 2006-03-14 (public-whatwg-archive@w3.org from March 2006)

From: Alexey Feldgendler <alexey@feldgendler.ru>
Date: Tue, 14 Mar 2006 18:04:48 +0600
Message-ID: <op.s6ejua2m1h6og4@localhost>
(This message has been originally sent off-list by mistake.)

On Fri, 10 Mar 2006 17:49:17 +0600, Mihai Sucan <mihai.sucan at gmail.com>
wrote:

>> Another solution may be to define functions like getElementById(),  
>> getElementsByTagName() etc so that they don't cross sandbox boundaries  
>> during their recursive search, at least by default. (If the sandbox  
>> proposal makes it to the spec, of course.)

> This is something I'd opt for. But ... this would be really bad, since  
> the spec would have to change the way getElementBy* functions work. It's  
> bad because you shouldn't make a spec that breaks other specs you rely  
> upon (this has probably already been done in this very spec).

No, it's not really a change in getElementBy* functions. Because there
have been no sandboxes before HTML 5, noone can really expect that these
functions treat sandbox elements the same as all other elements. Well,
sandboxes are "security barriers" by their nature, so it seems, at least
to me, quite natural to have getElementBy* functions stop at them.

> Therefore, I'd say this security issue should be left to be taken care  
> of by web application authors themselves. It's impossible for specs to  
> force authors to make secured apps.

It's not to force them, it's to help them. Sanitizing user-supplied HTML
is a very difficult task today, and new security holes in HTML cleaners of
many web applications are found again and again. I think that the spec
should make it easier to write a secure web application.

> Why do so? Authors already have to take care of not allowing some tags  
> and other tricks in the book (for example <meta refresh>). If the author  
> allows users to supply *any* tag (even the innocent <strong>), then they  
> already expose their app to potential security holes.

Yes, I know, and I think it's wrong. The spec should make <strong>
harmless, at least inside a sandbox.

> Malicious users can insert IDs not only for abusing a specific security
> hole, but only for the fun of breaking the page. They can also use class=
> and style=  attributes for the sole purpose of (badly) breaking the  
> layout
> of the targeted page.

CSS has properties that can be used to fit user-supplied content into a
box and make it sit there quietly ("overflow: hidden" etc). The user can
make whatever mess he wants of his own blog entry or whatever but it won't
harm the rest of the page.

> The spec can't do much in these situations. Shall the spec provide a way  
> for CSS files to *not* be applied in <sandbox>ed content?

CSS3 already has negation selectors that can be used for this:

*:not(sandbox) p { text-align: left; }

This makes all paragraphs left-aligned except in sandboxes.

> Generally authors just don't allows users to input HTML code at all (I  
> myself do that). It's the safest way and the easiest way.

Well, of course plain text is the safest. But many applications require
formatting markup in user-supplied text. Some applications don't try to
deal with the security pitfalls of HTML and invent their own markup syntax
(e.g. BBcode). However, there are two things wrong about these:

1. We already have a great markup language, which is HTML, and there are
many tools and libraries available that deal with it.

2. The WA1 spec defines facilities designed for WYSIWYG editing which
encourage the use of HTML as the markup language for user-supplied content.

> If allowing some tags is a requirement for some app, then the author  
> already has to take care which tags s/he allows and which attributes.  
> Nothing special. If s/he doesn't think of removing some attributes or  
> checking for allowed values, then ... it's not the spec to be blamed for  
> the security issue.

I've mentioned it in the original message. Though I find it too strict to
strip all id and class attributes from user-supplied text. They usually do
more good than bad.

> As Mikko said "allowing random user input with possibility to use user  
> supplied scripting is next to impossible to make secure".

That's what I'm trying to do, and I'm not yet convinced that it's
impossible. This is a hard task but I believe it's what the web needs.

BTW, my original message shows an exploit which is possible even if the
HTML cleaner doesn't allow scripts.


-- Opera M2 9.0 TP2 on Debian Linux 2.6.12-1-k7
* Origin: X-Man's Station at SW-Soft, Inc. [ICQ: 115226275]  
<alexey at feldgendler.ru>
Received on Tuesday, 14 March 2006 04:04:48 UTC