Re: Input on the agenda from Sam Ruby on 2009-03-25 (www-svg@w3.org from March 2009)

From: Sam Ruby <rubys@intertwingly.net>
Date: Wed, 25 Mar 2009 01:51:41 -0400
To: Jonas Sicking <jonas@sicking.cc>
CC: Maciej Stachowiak <mjs@apple.com>, Doug Schepers <schepers@w3.org>, Ian Hickson <ian@hixie.ch>, public-html@w3.org, www-svg <www-svg@w3.org>
Message-ID: <49C9C66D.8040909@intertwingly.net>

Jonas Sicking wrote:
> On Wed, Mar 18, 2009 at 1:24 AM, Jonas Sicking <jonas@sicking.cc> wrote:
>> On Mon, Mar 9, 2009 at 5:15 PM, Sam Ruby <rubys@intertwingly.net> wrote:
>>> Jonas Sicking wrote:
>>>> Personally I would like to see something that is even more HTMLy than
>>>> Hixies current proposal. I don't like at all that we have to use a
>>>> different tokenizer in "HTML mode" and in "foreign content mode". This
>>>> is both confusing to web developers and painful for end users (as
>>>> performance and code complexity suffers).
>>> Do you (or Henri) have a concrete proposal to offer?
>> The cases where I can see that the parser state is affecting the
>> tokenizer state is the following:
>>
>> CDATA handling. <![CDATA[]]> is currently only allowed in foreign
>> content. It would be great if we could allow <![CDATA[]]> consistently
>> throughout the markup. It sounds like Opera has done some
>> experimentation in this area.
>>
>> In HTML mode, there are a set of elements that change the tokenizers
>> 'content model flag':
>> The following elements switch the tokenizer to CDATA state: noscript,
>> noframes, style, xmp, iframe, script
>> The following elements switch the tokenizer to RCDATA state: title, textarea
>> The following elements switch the tokenizer to PLAINTEXT state: plaintext
>>
>> It would be great if we could allow the same set of tags to affect the
>> parser the same way in both HTML mode and in foreign content mode. The
>> only two tags that seem troublesome here is <script> and <style>. It
>> sounds like it might possibly might be agreement that it would be
>> possible to parse <script> as CDATA, which would leave <style> as the
>> only remaining controversial tag.
>>
>> If we made these changes I think there would be some optimizations
>> that we could do on the implementation side. However more importantly,
>> I think the consistency would be much appreciated by authors.
> 
> Just realized there was one more thing that I forgot about.
> 
> This isn't a case where the tokenizer is directly dependent on the
> parser, however it's nonetheless a case that I think will be confusing
> for authors.
> 
> Currently in foregin content mode, the 'empty XML element' syntax is
> supported. So you can write
> 
> <circle x="42" y="4711" />
> 
> This is IMHO a good thing. However this syntax does not work in HTML
> mode. So for example
> 
> <div id="output" />
> 
> does not create an empty div, but is rather treated as a start tag.
> 
> This would be an easy problem to solve if it wasn't for web
> compatibility concerns. However I'd still like to explore what could
> be done in this area. For example of there is a short list of tags for
> which we wouldn't support the empty element syntax, or if we could
> make empty-element syntax only work in standards mode (I'm not exited
> about either option).

I'm actually OK with this being the list of always empty tags, as is
reflected in the current spec.  <br></br> is an extreme case of one you
couldn't fix if you wanted to, and <script src=""/> is an example of one
that would be really nice if it could be fixed, but alas it can't either.

> / Jonas

- Sam Ruby

Received on Wednesday, 25 March 2009 05:52:35 UTC