Re: CDATA, Script, and Style from Jonas Sicking on 2009-04-01 (public-html@w3.org from April 2009)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 1 Apr 2009 00:37:34 -0700
To: Henri Sivonen <hsivonen@iki.fi>
Cc: Simon Pieters <simonp@opera.com>, Doug Schepers <schepers@w3.org>, HTML WG <public-html@w3.org>, "www-svg@w3.org" <www-svg@w3.org>
Message-Id: <E3C14A1F-1011-4246-9442-D521E988D190@sicking.cc>

On Apr 1, 2009, at 0:17, Henri Sivonen <hsivonen@iki.fi> wrote:
>> How do you
>> feel about my proposal in
>>
>> http://lists.w3.org/Archives/Public/public-html/2009Mar/0634.html
>>
>> It would result in a graded surprise where there's some change  
>> between
>> HTML <script> parsing between HTML4 and HTML5, and some surprise in
>> the boundry between SVG-in-HTML and SVG-in-XML.
>
> If this happened in the parser, it would result in <![CDATA[ ... ]]>  
> in text/html parsing differently from both XML and previous text/ 
> html behavior. I think that could be confusing to authors who try to  
> form a coherent mental model of the languages they are working with.

No, no changes are intended on the parser side for HTML.

> However, if <![CDATA[ ... ]]> remains in the DOM and is only  
> stripped from the data in the JavaScript parser or the CSS parser, I  
> suppose that model could count as coherent with the current <!-- -->  
> treatment model for script and style in text/html.

Yes, that is the idea.

>>>>> Problems with 2:
>>>>> Just stripping a heading and trailing "<![CDATA[" / "]]>" would  
>>>>> break
>>>>> markup like:
>>>>> <style>
>>>>> <![CDATA[
>>>>> rect { fill: yellow; }
>>>>> ]]>
>>>>> <![CDATA[
>>>>> circle { fill: blue; }
>>>>> ]]>
>>>>> </style>
>>>>>
>>>>> which probably happens occasionally due to copy-n-pasting.
>>>
>>> I don't like this, because it requires going back and modifying  
>>> buffers that
>>> had been already built instead of just tweaking forward-only  
>>> tokenizer state
>>> transitions, and it doesn't even work in the case where there are  
>>> multiple
>>> CDATA sections as shown above. If we end up doing something other  
>>> than
>>> what's currently in the draft, I'd much rather have what what  
>>> Simon proposes
>>> as #4.
>>
>> The stripping doesn't happen at a tokenizer stage. It happens after
>> all parsing is done when the inline data is taken from the DOM and
>> passed to the serializer.
>
> Do you mean passed to the script engine?

Yes, thanks.

> So the string "<![CDATA[" would appear in the content of the text  
> node in the DOM?

Yes

> I initially thought you meant removing "<![CDATA[" and "]]>" in the  
> tree builder.

No

> What about <![CDATA[ in SVG subtrees outside <script> and <style>?  
> It's useful for graceful degradation but still involves feedback to  
> the tokenizer unless supported anywhere outside foreign content as  
> well.

I think that is mostly an orthogonal issue. But I would like <! 
[CDATA[ ]]> in to be parsed as in XML both in foregin content mode,  
and in normal mode. To keep things consistent.

Opera has done experimenting with supporting <![CDATA[ ]]> in HTML and  
it seems it does not "break the web".

/ Jonas

Received on Wednesday, 1 April 2009 07:37:58 UTC