Re: CDATA, Script, and Style

On Tue, Mar 31, 2009 at 2:30 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Mar 25, 2009, at 16:24, Simon Pieters wrote:
>
>> On Thu, 19 Mar 2009 18:52:25 +0100, Jonas Sicking <jonas@sicking.cc>
>> wrote:
>>
>>> My feelings on 1 vs. 2 is:
>>>
>>> Problems with 1:
>>> Parsing <![CDATA[]]> inside a CDATA element "feels" weird.
>
> I agree that it feels weird.
>
> I think the biggest problem with this entire issue is that the difference
> between HTML <script> and <script> in XML is surprising and unintuitive, so
> we will have a surprise boundary somewhere no matter what. It seems on the
> general level we have the following options:
>
>  1) Have the surprise boundary between text/html and XML. (The situation
> before SVG-in-text/html)
>
>  2) Have the surprise boundary between HTML <script> in text/html and
> everything else. (The situation with SVG-in-text/html as drafted)
>
>  3) Have graded surprises with two boundaries:
>    a) Have a surprise boundary between HTML <script> and SVG-in-text/html
> <script> and another between SVG-in-text/html <script> and XML.
>    b) Have a surprise boundary between pre-HTML5 <script> and HTML5
> text/html <script>s and another between text/html and XML.
>
> I'm worried about escaping surprises in general having seen the RSS <title>
> epic fail.

I'm a little unclear as to what the behaviors in 3 are. I.e. which
parsing/processing algorithms would lead to the two scenarios you
describe?

I'm also unclear as to what behavior you are proposing. How do you
feel about my proposal in

http://lists.w3.org/Archives/Public/public-html/2009Mar/0634.html

It would result in a graded surprise where there's some change between
HTML <script> parsing between HTML4 and HTML5, and some surprise in
the boundry between SVG-in-HTML and SVG-in-XML.

>>> Problems with 2:
>>> Just stripping a heading and trailing "<![CDATA[" / "]]>" would break
>>> markup like:
>>> <style>
>>> <![CDATA[
>>> rect { fill: yellow; }
>>> ]]>
>>> <![CDATA[
>>> circle { fill: blue; }
>>> ]]>
>>> </style>
>>>
>>> which probably happens occasionally due to copy-n-pasting.
>
> I don't like this, because it requires going back and modifying buffers that
> had been already built instead of just tweaking forward-only tokenizer state
> transitions, and it doesn't even work in the case where there are multiple
> CDATA sections as shown above. If we end up doing something other than
> what's currently in the draft, I'd much rather have what what Simon proposes
> as #4.

The stripping doesn't happen at a tokenizer stage. It happens after
all parsing is done when the inline data is taken from the DOM and
passed to the serializer. See the details in the link above.

/ Jonas

Received on Tuesday, 31 March 2009 22:09:57 UTC