Re: HTML 5 from Eduard Pascual on 2010-04-07 (public-html-comments@w3.org from April 2010)

From: Eduard Pascual <herenvardo@gmail.com>
Date: Wed, 7 Apr 2010 20:06:10 +0200
To: "T.J. Crowder" <tj@crowdersoftware.com>
Cc: gesteehr@googlemail.com, public-html-comments@w3.org
Message-ID: <x2u6ea53251004071106g1362ff92peb1a7b663201d1b2@mail.gmail.com>

On Wed, Apr 7, 2010 at 5:49 PM, T.J. Crowder <tj@crowdersoftware.com> wrote:
>> > What makes ]]> easier to defend against than </code>?
>>
>> As I said, with <![CDATA[ ... ]]> you only need to care about the
>>
>> exact sequence "]]>": if it's found within an input, get rid of it or
>>
>> somehow fix it (string replacement "]]>" => "]]>]]&gt;<![CDATA[" gets
>>
>> the job done safely). With </code> (or even with Arthur's <cdata>
>>
>> suggestion, to some degree), things are quite more complex:
>
> I don't understand. Any sanitizer has to escape < and &.
Not any: for content that will go inside <![CDATA[ ... ]]> there is no
need at all to care about < and &. That's the whole point of CDATA. In
other words, a < inside a CDATA block is exactly equivalent to a &lt;
outside of it: it will have no special meaning and just render as "<".
The same holds for & and &amp;, and also for > and &gt;.

For content generated programatically, it's quite indifferent to use
CDATA or to escape stuff. For manually authored content, CDATA saves a
lot of authoring pain (I'm assuming this is the case Georg had in mind
when starting this thread). If used with user-provided content,
Georg's proposal would open up a potential for injection attacks that
require the spec, implementations, and server-side scripts to do a
good deal of non-trivial fool-proofing. CDATA addresses the use-case,
without so many nasty side effects.

Regards,
Eduard Pascual

Received on Wednesday, 7 April 2010 18:07:02 UTC