- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 29 Aug 2005 22:29:08 +0300
What kind of approach to tag inference can HTML5 be expected to take? For an SGML validator that is parsing HTML 4 the set of possible element names is finite. However, a browser needs to deal with an infinite set of a potential elements names. Therefore, it makes a difference whether end tag inference is based on what is allowed as a child of an element or on what elements are not allowed. Example: <p><foo> Is 'foo' an element that not allowed as a child of 'p' and, therefore, implicitly closes the 'p'? Or is 'foo' not on the list of elements that close 'p' and, therefore, does not implicitly close it? Which way are the inference rules going to be defined? As far as I can tell, there are four kinds of inference needed when parsing *conforming* documents (ie. no second stack for residual style): 1) Element end causes the end of the elements that is on the top of the stack*. 2) End of the data stream causes the end of the element that is on the top of the stack. 3) Element start causes the end of the element that is on the top of the stack. 4) Element start causes another element start before itself. Is this list complete? I am assuming that for *conforming* documents, the above-mentioned inference decisions can be taken by observing the top of the stack and the element name associated with the current end or start element event. Correct? (I am assuming the rules may be applied repeatedly. Ie. null on stack and start 'title' implies 'html' start. 'html' on stack and start 'title' implies 'head' start. 'head' on stack and start 'title' implies nothing and the start 'title' goes through.) It seems to me that #3 is the tricky case in terms of interaction with unknown element names. #1 and #2 require a list of elements whose end tag is optional. #4 requires a map of top of stack plus current start pairs to inferred start tags. * I am assuming an implementation maintains a stack of open elements or works directly on a parser tree in which case the path from the current node to the root has the right same role as the stack. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Monday, 29 August 2005 12:29:08 UTC