- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Fri, 08 Feb 2008 14:45:58 +0000
- To: HTML WG <public-html@w3.org>
http://html5lib.googlecode.com/svn/trunk/testdata/tree-construction/tests1.dat
has the following test case:
#data
<b>Test</i>Test
#errors
Line: 1 Col: 3 Unexpected start tag (b). Expected DOCTYPE.
Line: 1 Col: 11 End tag (i) violates step 1, paragraph 1 of the adoption
agency algorithm.
Line: 1 Col: 15 Expected closing tag. Unexpected end of file.
#document
| <html>
| <head>
| <body>
| <b>
| "TestTest"
The text-node coalescence is defined in
http://www.w3.org/html/wg/html5/#append as:
"When the steps below require the UA to append a character to a
node, the UA must collect it and all subsequent consecutive characters
that would be appended to that node, and insert one Text node whose data
is the concatenation of all those characters."
The tokeniser produces tokens [<b>, "T", "e", "s", "t", </i>, "T", "e",
"s", "t"]. As I read the spec, the "T" will trigger the "append a
character" step, so it will collect the three subsequent consecutive
character tokens and append one Text node "Test". Then it will ignore
the end tag, and then do "append a character" again and append a new
Text node, so the output should be
| <html>
| <head>
| <body>
| <b>
| "Test"
| "Test"
But I could also read the spec as meaning that once "append a character"
is first run, "estTest" are the characters that will subsequently be
appended consecutively to the <b> node, which will give the output as in
tests1.dat. So it would be nice to know what is correct.
Also, what should happen with:
<b>Test<script id=s>var s=document.getElementById('s');
s.parentNode.removeChild(s)</script>Test
? I'm not sure how this could be implemented differently to the
"<b>Test</i>Test" case while following the general pattern of the HTML5
parser algorithm, so it should be parsed the same (whichever way that is).
Firefox 2, Opera 9.5 and Safari 3 create two adjacent text nodes in the
<script> case, and IE6 can't be tested since it doesn't delete the <script>.
In the </i> case, Firefox produces one text node, Opera and Safari
produce two, and IE can't be tested since it makes an element named "/I".
Using "<b>Test</li>Test" instead, IE6 produces one text node, and the
others behave the same as with </i>.
Also, are UAs allowed to insert a Text node before having received all
the characters, and append new characters later? (e.g. for incremental
display of a long plain-text element). I assume that should be
permitted. But the spec says the node must be inserted after all the
characters have been collected, and I expect UAs ought not to render
text that isn't (yet) in the Document.
So, I think it should be defined either like:
"When the steps below require the UA to append a character to a
node: If the last child of the node is a Text node, then the UA must
append the character to that Text node; otherwise it must create a new
Text node whose data is the character and append it to the node."
(which would always give "TestTest"), or like
"When the steps below require the UA to append a character to a
node, the UA must create one Text node whose data is the character and
append it to the node. While the next token is a character token that
would be appended in the same insertion mode, that character must
instead be appended to this Text node."
(which would always give "Test","Test").
--
Philip Taylor
pjt47@cam.ac.uk
Received on Friday, 8 February 2008 14:46:41 UTC