- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Sat, 31 May 2008 16:45:55 +0100
- To: HTML WG <public-html@w3.org>
Consider a document like:
a</li>b</li>c</li>d...
The spec says that should be a single text node "abcd...". IE 6, Firefox
3 and Safari 3 do that; Opera 9.5 instead splits it into lots of text nodes.
Also consider:
a<table>b<td></td>c<td></td>d...
The spec says that should be a single text node "abcd...", then a table.
Firefox 3 and Safari 3 don't do that, and instead split it into lots
of adjacent text nodes. (IE and Opera don't do table foster parenting
like this at all.)
In many implementations (particularly when strings are immutable, e.g.
in Python), constructing the string "abcd..." by concatenating lots of
individual characters one at a time has cost O(n^2) in the length of the
string.
That problem can (I think) be avoided in the first case without too much
trouble: the parser's "append text node to current node" method can
build up a list of strings, and then they can be flushed (concatenate
all strings, append result to current node) once the parser calls a
different method ("append element to current node" etc). But I can't see
an adequately efficient way to avoid the problem in the table case,
particularly if scripts are allowed to observe and modify the DOM as it
is being parsed.
So this is a DOS vulnerability, since an attacker can do 'n' work
sending a document to someone and cause them to do O(n^2) work parsing
it (even without scripting). Rather than relying on implementors to
notice and fix this problem in incompatible ways, the spec should be
changed to require coalescence only in the cases where the parser is
consecutively appending text nodes to the current node and not in any
other cases.
Hence you would get parse trees like:
a</li>b</li>c</li>d...
| <html>
| <head>
| <body>
| "abcd..."
a<table>b<td></td>c<td></td>d...
| <html>
| <head>
| <body>
| "a"
| "b"
| "c"
| "d..."
| <table> etc
a<script id=s>
var s = document.getElementById('s');
s.parentNode.removeChild(s);
</script>b
| <html>
| <head>
| <body>
| "a"
| "b"
--
Philip Taylor
pjt47@cam.ac.uk
Received on Saturday, 31 May 2008 15:46:36 UTC