- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 6 Oct 2009 01:05:18 +0000 (UTC)
On Wed, 23 Sep 2009, ?istein E. Andersen wrote: > > The major obstacle for an independent tokeniser seems to be that the > content model flag is set to RCDATA, RAWTEXT or PLAINTEXT by the > treebuilder and not by the tokeniser. In most cases, the new content > model flag is entirely predictable from the start tag (and > RCDATA/RAWTEXT element names are known to the tokeniser already). The > only exceptions I have found so far concern start tags within <select> > and <frameset>, which are dropped by the treebuilder and therefore do > not cause the content model flag to change. Even these cases could > perhaps have been handled by the tokeniser without too much trouble (and > without changing the spec) if it were not for the "in select in table" > insertion mode, where a missing </select> end tag may be inferred > depending on the stack of open elements. > > It seems unfortunate to abandon the possibility of an independent > tokeniser just to handle what appears to be a corner case of a corner > case, viz, unclosed RCDATA/RAWTEXT elements inside an unclosed <select> > element in a table. The easiest solution would be to switch the content > model flag upon seeing an RCDATA/RAWTEXT/PLAINTEXT start tag > irrespective of insertion mode, i.e., also within <select> and > <frameset>, which would allow the tokeniser to take care of this without > added complexity. Other solutions might be worth considering if this is > found to be too incompatible with existing pages. (I could have a look > at the the http://www.dotnetdotcom.org/ dataset if that would be of any > use.) I don't feel comfortable changing this without a _really_ good reason, given the high risk of compatibility problems. Having the tokeniser be separable was never a design goal; that it is possible to get even as close as it is today is frankly quite surprising to me. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 5 October 2009 18:05:18 UTC