W3C home > Mailing lists > Public > public-html@w3.org > November 2009

Re: Tree construction: Coalescing text nodes

From: Geoffrey Sneddon <gsneddon@opera.com>
Date: Fri, 13 Nov 2009 11:06:08 +0100
Message-ID: <4AFD2F90.8030705@opera.com>
To: Boris Zbarsky <bzbarsky@MIT.EDU>
CC: Ian Hickson <ian@hixie.ch>, public-html@w3.org, pjt47@cam.ac.uk
Boris Zbarsky wrote:
> On 11/11/09 7:12 AM, Geoffrey Sneddon wrote:
>> However, given an implementation like that of all three ports of
>> html5lib (Python, PHP, Ruby), the following would equally lead to O(n^2)
>> behaviour given the above conditions (i.e., a immutable string type like
>> that of Python):
>> a</x>a</x>a</x>a</x>a</x>a</x>a</x>a</x>a</x>a</x>a</x>a</x>a etc.
> There seems to be an assumption here that a Text node has to store all 
> its text as a single string in the underlying implementation language, 
> right.  That doesn't seem like a hard requirement to me, quite apart 
> from the other points you raised.

Indeed, although this is probably only true for the implementations 
where you control the entire toolchain (e.g., html5lib has pluggable 
tree-builders and it would only be possible to modify the internal 
representations of those that store the tree in Python structures, and 
not in C structures within extension code).

However, I think that such implementations are probably more important 
in terms of the structure of the DOM created (because they are more 
likely to support scripting), and as such it seems silly to have 
anything apart from a single text node in all cases, especially when 
such implementations can likely have a single text node backed by 
multiple strings internally. Other implementations that don't have 
control over the internal storage of the data will have to do something 
ugly whatever the spec says, so I don't see anything gained by requiring 
adjacent text nodes.

Geoffrey Sneddon  Opera Software
Received on Friday, 13 November 2009 10:06:55 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:03 UTC