- From: Alexey Feldgendler <alexey@feldgendler.ru>
- Date: Fri, 10 Mar 2006 12:45:29 +0600
Does the current version of the spec define what happens to elements with duplicate ID values? The problem of duplicate ID isn't just another issue where it's nice to have some well-defined error recovery just for uniformity. There are cases when duplicate IDs should be viewed as a security concern. Consider a script which augments the HTML page after it has been parsed by attaching event listeners to elements in the DOM tree, inserting new nodes into the tree etc. This is common practice, for example, for many web-based WYSIWYG editors. In this scenario, any method the script uses for identificaation of the DOM nodes subject to augmentation is vulnerable to possible spoofing by user-supplied content present on the same page. For example, imagine a script which finds a button by ID and attaches an event listener to it. A possible markup looks like this: <div> ...blog entry body... </div> <button id="addtomemories">Add this entry to memories</button> <script> document.getElementById('addtomemories').addEventListener('click', doSomeNiceAJAX); </script> So, a malicious blog author can make the following entry: I have found a <a href="#" id="addtomemories">cool website</a>. Depending on how the browser handles duplicate IDs, any of the following unwanted effects may occur, or both: 1. Clicking the link in the blog entry adds the entry to memories list of the reader. 2. Clicking the real "Add this entry to memories" button does nothing. One can think of other examples, possibly more dangerous. Other methods of identification (by tag name, by class, by CSS selector as proposed recently) are also vulnerable. This kind of attack is hard to circumvent through use of HTML cleaners because id="addtomemories" looks like an innocent attribute, like an anchor for navigation. Preventing such attacks by a HTML cleaner would require either making a full list of all "forbidden" IDs, class names etc, or imposing Draconian rules upon user-supplied content, completely disallowing such useful attributes like id and class. How to address this security issue is an open question. Always using carefully constructed XPath expressions for finding the nodes may be a solution because an XPath expression can specify the whole path starting from the root, like /html/body/button[@id="addtomemories"] (though careless XPath expressiions like //[@id="addtomemories"] can be vulnerable as well). Another solution may be to define functions like getElementById(), getElementsByTagName() etc so that they don't cross sandbox boundaries during their recursive search, at least by default. (If the sandbox proposal makes it to the spec, of course.) Ideas are welcome. -- Opera M2 9.0 TP2 on Debian Linux 2.6.12-1-k7 * Origin: X-Man's Station at SW-Soft, Inc. [ICQ: 115226275] <alexey at feldgendler.ru>
Received on Thursday, 9 March 2006 22:45:29 UTC