Re: CDATA, Script, and Style

On Wed, Mar 18, 2009 at 5:21 PM, Doug Schepers <schepers@w3.org> wrote:
> Hi, Jonas-
>
> Jonas Sicking wrote (on 3/18/09 6:56 PM):
>>
>> On Wed, Mar 18, 2009 at 2:26 PM, Doug Schepers<schepers@w3.org>  wrote:
>>
>> [snipped Doug dissing Robins SVG skills and calling his ]
>
> FWIW, Robin's SVG skills make *you* look like a chump.

I come from a long proud line of chumps!

It helps me in conversations with other chumps :)

>> <svg width="10cm" height="5cm" viewBox="0 0 1000 500"
>>      xmlns="http://www.w3.org/2000/svg" version="1.1">
>>   <defs>
>>     <style type="text/css"><![CDATA[
>>       svg>  rect {
>>         fill: red;
>>       }
>>     ]]></style>
>>   </defs>
>>   <rect x="200" y="100" width="600" height="300"/>
>> </svg>
>>
>> This should work great when copied into text/html. As would it if the
>> style tag instead looked like:
>>
>> <style type="text/css">
>>   svg>  rect {
>>     fill: red;
>>   }
>> </style>
>>
>> However, if the style was
>>
>> <style type="text/css">
>>   svg&gt; rect {
>>     fill: red;
>>   }
>> </style>
>>
>> then this would work as expected in XML-SVG, but there might possibly
>> be problems when the markup is copied into a text/html document. In
>> HTML, the contents of<style>  is parsed as CDATA. That means that no
>> entities are escaped. So the above style tag would contain invalid CSS
>> as the "&gt;" would not be turned into a ">" and so the selector would
>> be invalid and not match anything.
>
> And your clear explanation makes *me* look like a chump.  Wow, chumped by a
> chump, how humiliating.
>
> Now, couldn't we define a set of common (in script and style) entities that
> *would* be resolved, just to cover our bases?  Or would that likely break
> content?  I suppose it's likely that somebody out there is using a &gt;
> entity in a string in script or even CSS that they want to stay that way...
>
> Plus, it does seem like special casing in a way we might want to avoid.

Indeed. It's something to consider though. There are no perfect
solutions here. (hey, we're talking about HTML. If there was anything
that was perfect we'd probably all be in a state of shock for weeks).

>> So the question is, how common do we think this is? We're looking for
>> how common it is that:
>> 1) An SVG file contains inline<style>, *and*
>> 2) That style does not use<![CDATA[]]>  for the contents of the element,
>> *and*
>> 3) The contents uses entities.
>
> Yes, this is a case I think we could safely lose.

That would be great IMHO.

>> For markup that uses<![CDATA[]]>  I believe we can ensure that the
>> markup will work correctly even in text/html, as detailed in my
>> proposal at [1].
>
> Okay, rereading that, it does make sense.  I thought we were talking about
> breaking all instances of CDATA in <style>. Phew!
>
> I'm fine with your proposal, then, favoring solution 1, "Support
> <![CDATA[]]> while tokenizing CDATA."

Excellent.

My feelings on 1 vs. 2 is:

Problems with 1:
Parsing <![CDATA[]]> inside a CDATA element "feels" weird. Parsing for
CDATA has remained largely the same since the dawn of human kind
(well, the particular branch of human kind that supports SGML). But
the bigger problem with supporting <!CDATA[]]> inside <script> is that
it'd break existing HTML content like:
<script>
x = "<res><![CDATA[if a < b < c then they are sorted]]></res>";
var parser = new DOMParser();
var doc = parser.parseFromString(x, "text/xml");
xhr = new XMLHttpRequest();
xhr.open("POST", uri);
xhr.send(doc);
</script>

Problems with 2:
Just stripping a heading and trailing "<![CDATA[" / "]]>" would break
markup like:
<style>
<![CDATA[
rect { fill: yellow; }
]]>
<![CDATA[
circle { fill: blue; }
]]>
</style>

which probably happens occasionally due to copy-n-pasting.


So neither solution is perfect. Though I'm thinking that 2 will
probably cause less trouble with existing content.

/ Jonas

Received on Thursday, 19 March 2009 17:53:03 UTC