Re: CDATA, Script, and Style

On Apr 1, 2009, at 10:37, Jonas Sicking wrote:

>>>>>> Problems with 2:
>>>>>> Just stripping a heading and trailing "<![CDATA[" / "]]>" would  
>>>>>> break
>>>>>> markup like:
>>>>>> <style>
>>>>>> <![CDATA[
>>>>>> rect { fill: yellow; }
>>>>>> ]]>
>>>>>> <![CDATA[
>>>>>> circle { fill: blue; }
>>>>>> ]]>
>>>>>> </style>
>>>>>>
>>>>>> which probably happens occasionally due to copy-n-pasting.
>>>>
>>>> I don't like this, because it requires going back and modifying  
>>>> buffers that
>>>> had been already built instead of just tweaking forward-only  
>>>> tokenizer state
>>>> transitions, and it doesn't even work in the case where there are  
>>>> multiple
>>>> CDATA sections as shown above. If we end up doing something other  
>>>> than
>>>> what's currently in the draft, I'd much rather have what what  
>>>> Simon proposes
>>>> as #4.
>>>
>>> The stripping doesn't happen at a tokenizer stage. It happens after
>>> all parsing is done when the inline data is taken from the DOM and
>>> passed to the serializer.
>>
>> Do you mean passed to the script engine?
>
> Yes, thanks.
>
>> So the string "<![CDATA[" would appear in the content of the text  
>> node in the DOM?
>
> Yes

If "<![CDATA[" ends up in the DOM, I think the end result could be  
made more robust if the operation of handing DOM data to the CSS or JS  
parser didn't try to drop "<![CDATA[" and "]]>" but instead the JS and  
CSS parser were changed to treat those strings as comments, i.e. like  
"/* */". This way, they wouldn't be dropped from within potentially  
existing string literals.

This approach would cause notable leakage of the SVG-in-text/html  
feature into other parts of a browser engine, though, which isn't very  
nice.

Also, I'm a bit concerned that letting "<![CDATA[" and "]]>" reach the  
DOM would result in those strings being escaped as "&gt;![CDATA[" and  
"]]&lt;" if serialized to XML, so going back and forth a couple of  
times through real serializer and via copying and pasting would result  
in some ugly cruft.

>> What about <![CDATA[ in SVG subtrees outside <script> and <style>?  
>> It's useful for graceful degradation but still involves feedback to  
>> the tokenizer unless supported anywhere outside foreign content as  
>> well.
>
> I think that is mostly an orthogonal issue. But I would like <! 
> [CDATA[ ]]> in to be parsed as in XML both in foregin content mode,  
> and in normal mode. To keep things consistent.

I think it's relevant in two ways:

1) If the syntax behaves as in XML outside <script> and <style> but  
not as in XML inside <script> and <style>, the result may be confusing.

2) Having CDATA sections that behave like XML CDATA sections in HTML5  
parsers but like bogus comments in earlier browsers is useful for  
hiding SVG text from old browsers for graceful degradation. However,  
if this syntax causes feedback from the tree builder to the tokenizer,  
we haven't managed to completely eliminate the (non-trivial) feedback  
to the tokenizer meaning the other efforts to do so wouldn't be very  
useful.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 6 April 2009 18:55:38 UTC