Re: Input on the agenda

On Mon, Mar 9, 2009 at 5:15 PM, Sam Ruby <rubys@intertwingly.net> wrote:
> Jonas Sicking wrote:
>>
>> Personally I would like to see something that is even more HTMLy than
>> Hixies current proposal. I don't like at all that we have to use a
>> different tokenizer in "HTML mode" and in "foreign content mode". This
>> is both confusing to web developers and painful for end users (as
>> performance and code complexity suffers).
>
> Do you (or Henri) have a concrete proposal to offer?

The cases where I can see that the parser state is affecting the
tokenizer state is the following:

CDATA handling. <![CDATA[]]> is currently only allowed in foreign
content. It would be great if we could allow <![CDATA[]]> consistently
throughout the markup. It sounds like Opera has done some
experimentation in this area.

In HTML mode, there are a set of elements that change the tokenizers
'content model flag':
The following elements switch the tokenizer to CDATA state: noscript,
noframes, style, xmp, iframe, script
The following elements switch the tokenizer to RCDATA state: title, textarea
The following elements switch the tokenizer to PLAINTEXT state: plaintext

It would be great if we could allow the same set of tags to affect the
parser the same way in both HTML mode and in foreign content mode. The
only two tags that seem troublesome here is <script> and <style>. It
sounds like it might possibly might be agreement that it would be
possible to parse <script> as CDATA, which would leave <style> as the
only remaining controversial tag.

If we made these changes I think there would be some optimizations
that we could do on the implementation side. However more importantly,
I think the consistency would be much appreciated by authors.

/ Jonas

Received on Wednesday, 18 March 2009 08:24:56 UTC