Re: Tree construction: Coalescing text nodes from Henri Sivonen on 2009-11-18 (public-html@w3.org from November 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 18 Nov 2009 11:27:38 +0200
To: Geoffrey Sneddon <gsneddon@opera.com>
Cc: Ian Hickson <ian@hixie.ch>, public-html@w3.org, pjt47@cam.ac.uk
Message-Id: <3DA856AB-85AD-4B23-A577-CE31E4C7B1F9@iki.fi>

On Nov 13, 2009, at 14:15, Geoffrey Sneddon wrote:

> Henri Sivonen wrote:
>> On Nov 13, 2009, at 12:06, Geoffrey Sneddon wrote:
>>> However, I think that such implementations are probably more important in terms of the structure of the DOM created (because they are more likely to support scripting), and as such it seems silly to have anything apart from a single text node in all cases, especially when such implementations can likely have a single text node backed by multiple strings internally.
>> It's not necessarily silly not to require browsers to coalesce in all cases. Would you make parser-inserted text nodes coalesce into script-created text nodes or parser-created older-than-previous text nodes that a script has moved around?
> 
> No, but I would expect the parser (without executing any script) to always create a DOM with no adjacent text nodes. If you start manually manipulating the DOM via scripting I'd expect to end up with the DOM I created (e.g., if I appendChild a text node I would expect a text node to be appended, I wouldn't expect, ever, to get a single text node if there was already a text node as the last child).

That wasn't quite the case I was asking about. I concretely, I was asking about the following (illustrated here as document.write but I'm also asking about the case where the document.write boundaries are network buffer boundaries instead):
document.write("<div id=thediv>");
document.getElementById('thediv').appendChild(document.createTextNode("foo"));
document.write("bar");

One text node with data "foobar" or two text nodes: "foo" followed by "bar"? Does it matter?

document.write("<div id=thediv>");
document.write("foo");
document.write("bar");

One text node with data "foobar" or two text nodes: "foo" followed by "bar"? Does it matter?

In non-foster-parenting cases, to make the distinction, it's sufficient for the parser's DOM builder to remember the most recent text node it has inserted into the tree and append more text to it when the run of text is split either by a document.write() boundary or by a network buffer boundary (if the implementation opts to flush at the end of network buffer boundaries to support use cases like Mozilla tinderbox logs that have a gigantic text node arriving over a period of multiple seconds over the network).

In foster-parenting cases, that's not enough. Consider: <table><tr>f<td>c</td>f

Here, when the second 'f' is foster-parented, the cell content 'c' is the text node the parser inserted last. Now, if foster-parenting examines the DOM to see if the foster parent already has a text node previous sibling (in order to merely extend that text node), the previous sibling could be script-created.

Does specifying whether foster-parented text coalesces or not really matter for interop? (I believe coalescing all non-foster-parented parser-inserted text does matter for interop.) Is it really bad for the parser to extend script-created text nodes? If it is bad, is it really bad for foster-parenting to create adjacent parser-inserted text nodes?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 18 November 2009 09:28:21 UTC