Re: Tree construction: Coalescing text nodes

On Sat, 31 May 2008, Philip Taylor wrote:
> 
> Consider a document like:
> 
>   a</li>b</li>c</li>d...
> 
> The spec says that should be a single text node "abcd...". IE 6, Firefox 3 and
> Safari 3 do that; Opera 9.5 instead splits it into lots of text nodes.
> 
> Also consider:
> 
>   a<table>b<td></td>c<td></td>d...
> 
> The spec says that should be a single text node "abcd...", then a table.
> Firefox 3 and Safari 3 don't do that, and instead split it into lots of
> adjacent text nodes. (IE and Opera don't do table foster parenting like this
> at all.)
> 
> In many implementations (particularly when strings are immutable, e.g. in
> Python), constructing the string "abcd..." by concatenating lots of individual
> characters one at a time has cost O(n^2) in the length of the string.
> 
> That problem can (I think) be avoided in the first case without too much
> trouble: the parser's "append text node to current node" method can build up a
> list of strings, and then they can be flushed (concatenate all strings, append
> result to current node) once the parser calls a different method ("append
> element to current node" etc). But I can't see an adequately efficient way to
> avoid the problem in the table case, particularly if scripts are allowed to
> observe and modify the DOM as it is being parsed.
> 
> So this is a DOS vulnerability, since an attacker can do 'n' work sending a
> document to someone and cause them to do O(n^2) work parsing it (even without
> scripting).
>
> Rather than relying on implementors to notice and fix this problem in 
> incompatible ways, the spec should be changed to require coalescence 
> only in the cases where the parser is consecutively appending text nodes 
> to the current node and not in any other cases.

I've tried to fix this. Please let me know if the fix (r2124) works for 
you. I'm not sure exactly what the implications of my fix are.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 29 August 2008 09:26:29 UTC