- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Sat, 31 May 2008 16:45:55 +0100
- To: HTML WG <public-html@w3.org>
Consider a document like: a</li>b</li>c</li>d... The spec says that should be a single text node "abcd...". IE 6, Firefox 3 and Safari 3 do that; Opera 9.5 instead splits it into lots of text nodes. Also consider: a<table>b<td></td>c<td></td>d... The spec says that should be a single text node "abcd...", then a table. Firefox 3 and Safari 3 don't do that, and instead split it into lots of adjacent text nodes. (IE and Opera don't do table foster parenting like this at all.) In many implementations (particularly when strings are immutable, e.g. in Python), constructing the string "abcd..." by concatenating lots of individual characters one at a time has cost O(n^2) in the length of the string. That problem can (I think) be avoided in the first case without too much trouble: the parser's "append text node to current node" method can build up a list of strings, and then they can be flushed (concatenate all strings, append result to current node) once the parser calls a different method ("append element to current node" etc). But I can't see an adequately efficient way to avoid the problem in the table case, particularly if scripts are allowed to observe and modify the DOM as it is being parsed. So this is a DOS vulnerability, since an attacker can do 'n' work sending a document to someone and cause them to do O(n^2) work parsing it (even without scripting). Rather than relying on implementors to notice and fix this problem in incompatible ways, the spec should be changed to require coalescence only in the cases where the parser is consecutively appending text nodes to the current node and not in any other cases. Hence you would get parse trees like: a</li>b</li>c</li>d... | <html> | <head> | <body> | "abcd..." a<table>b<td></td>c<td></td>d... | <html> | <head> | <body> | "a" | "b" | "c" | "d..." | <table> etc a<script id=s> var s = document.getElementById('s'); s.parentNode.removeChild(s); </script>b | <html> | <head> | <body> | "a" | "b" -- Philip Taylor pjt47@cam.ac.uk
Received on Saturday, 31 May 2008 15:46:36 UTC