[whatwg] Potentially avoidable tokeniser/treebuilder dependency

On Wed, 23 Sep 2009, ?istein E. Andersen wrote:
>
> The major obstacle for an independent tokeniser seems to be that the 
> content model flag is set to RCDATA, RAWTEXT or PLAINTEXT by the 
> treebuilder and not by the tokeniser. In most cases, the new content 
> model flag is entirely predictable from the start tag (and 
> RCDATA/RAWTEXT element names are known to the tokeniser already).  The 
> only exceptions I have found so far concern start tags within <select> 
> and <frameset>, which are dropped by the treebuilder and therefore do 
> not cause the content model flag to change.  Even these cases could 
> perhaps have been handled by the tokeniser without too much trouble (and 
> without changing the spec) if it were not for the "in select in table" 
> insertion mode, where a missing </select> end tag may be inferred 
> depending on the stack of open elements.
> 
> It seems unfortunate to abandon the possibility of an independent 
> tokeniser just to handle what appears to be a corner case of a corner 
> case, viz, unclosed RCDATA/RAWTEXT elements inside an unclosed <select> 
> element in a table.  The easiest solution would be to switch the content 
> model flag upon seeing an RCDATA/RAWTEXT/PLAINTEXT start tag 
> irrespective of insertion mode, i.e., also within <select> and 
> <frameset>, which would allow the tokeniser to take care of this without 
> added complexity.  Other solutions might be worth considering if this is 
> found to be too incompatible with existing pages.  (I could have a look 
> at the the http://www.dotnetdotcom.org/ dataset if that would be of any 
> use.)

I don't feel comfortable changing this without a _really_ good reason, 
given the high risk of compatibility problems. Having the tokeniser be 
separable was never a design goal; that it is possible to get even as 
close as it is today is frankly quite surprising to me.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 5 October 2009 18:05:18 UTC