Re: Input on the agenda from Jonas Sicking on 2009-03-24 (public-html@w3.org from March 2009)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 23 Mar 2009 17:20:30 -0700
To: Sam Ruby <rubys@intertwingly.net>
Cc: Maciej Stachowiak <mjs@apple.com>, Doug Schepers <schepers@w3.org>, Ian Hickson <ian@hixie.ch>, public-html@w3.org, www-svg <www-svg@w3.org>
Message-ID: <63df84f0903231720h586f74b9oed2f901e4ee2a226@mail.gmail.com>

On Wed, Mar 18, 2009 at 1:24 AM, Jonas Sicking <jonas@sicking.cc> wrote:
> On Mon, Mar 9, 2009 at 5:15 PM, Sam Ruby <rubys@intertwingly.net> wrote:
>> Jonas Sicking wrote:
>>>
>>> Personally I would like to see something that is even more HTMLy than
>>> Hixies current proposal. I don't like at all that we have to use a
>>> different tokenizer in "HTML mode" and in "foreign content mode". This
>>> is both confusing to web developers and painful for end users (as
>>> performance and code complexity suffers).
>>
>> Do you (or Henri) have a concrete proposal to offer?
>
> The cases where I can see that the parser state is affecting the
> tokenizer state is the following:
>
> CDATA handling. <![CDATA[]]> is currently only allowed in foreign
> content. It would be great if we could allow <![CDATA[]]> consistently
> throughout the markup. It sounds like Opera has done some
> experimentation in this area.
>
> In HTML mode, there are a set of elements that change the tokenizers
> 'content model flag':
> The following elements switch the tokenizer to CDATA state: noscript,
> noframes, style, xmp, iframe, script
> The following elements switch the tokenizer to RCDATA state: title, textarea
> The following elements switch the tokenizer to PLAINTEXT state: plaintext
>
> It would be great if we could allow the same set of tags to affect the
> parser the same way in both HTML mode and in foreign content mode. The
> only two tags that seem troublesome here is <script> and <style>. It
> sounds like it might possibly might be agreement that it would be
> possible to parse <script> as CDATA, which would leave <style> as the
> only remaining controversial tag.
>
> If we made these changes I think there would be some optimizations
> that we could do on the implementation side. However more importantly,
> I think the consistency would be much appreciated by authors.

Just realized there was one more thing that I forgot about.

This isn't a case where the tokenizer is directly dependent on the
parser, however it's nonetheless a case that I think will be confusing
for authors.

Currently in foregin content mode, the 'empty XML element' syntax is
supported. So you can write

<circle x="42" y="4711" />

This is IMHO a good thing. However this syntax does not work in HTML
mode. So for example

<div id="output" />

does not create an empty div, but is rather treated as a start tag.

This would be an easy problem to solve if it wasn't for web
compatibility concerns. However I'd still like to explore what could
be done in this area. For example of there is a short list of tags for
which we wouldn't support the empty element syntax, or if we could
make empty-element syntax only work in standards mode (I'm not exited
about either option).

/ Jonas

Received on Tuesday, 24 March 2009 00:21:21 UTC