Re: HTML 5

> > For content generated programatically, it's quite indifferent to use
> > CDATA or to escape stuff.
>
> No, there's a very large difference. Currently, if I have user-generated
> content, I just escape it. All of it. Completely. With the same rules.
> Whereas if we start saying "you escape this, but not this" or "you
> escape this one way, and you escape this other thing a different way,"
> you dramatically complicate the rules any sanitizer must follow. Which
> inevitably leads to sanitizers that fail, and successful injection
attacks.

Hmmm...or *not*. Because if I run the <!CDATA[[ blah blah blah ]]> through
my sanitizer for user-generated input, it *won't be* a CDATA anymore. It'll
be a &lt;!CDATA[[ blah blah blah ]]&gt; So that may not be a problem.

Have to think about that. Might be a problem for sanitizers that ignore >
(which a good one shouldn't IMHO, but I don't think I'm in the majority
there), since it could prematurely end a CDATA it was embedded in,
but...hmmm...
--
T.J. Crowder
Independent Software Consultant
tj / crowder software / com
www.crowdersoftware.com



On 7 April 2010 21:45, T.J. Crowder <tj@crowdersoftware.com> wrote:

>  For content generated programatically, it's quite indifferent to use
>
> CDATA or to escape stuff.
>
>
> No, there's a very large difference. Currently, if I have user-generated
> content, I just escape it. All of it. Completely. With the same rules.
> Whereas if we start saying "you escape this, but not this" or "you escape
> this *one* way, and you escape this other thing a *different* way," you
> dramatically complicate the rules any sanitizer must follow. Which
> inevitably leads to sanitizers that fail, and successful injection attacks.
>
> I'm all for making things easier for people, but in this case I'm thinking
> the way to do that is to use a proper tool for authoring, not to make life a
> lot more difficult for people dealing with user-generated content.
>
> (I'm assuming this is the case Georg had in mind when starting this thread)
>
>
> Until or unless Georg comes back to the thread, FWIW I don't think we
> should assume anything on his behalf.
> --
> T.J. Crowder
> Independent Software Consultant
> tj / crowder software / com
> www.crowdersoftware.com
>
>
>
> On 7 April 2010 19:06, Eduard Pascual <herenvardo@gmail.com> wrote:
>
>> On Wed, Apr 7, 2010 at 5:49 PM, T.J. Crowder <tj@crowdersoftware.com>
>> wrote:
>> >> > What makes ]]> easier to defend against than </code>?
>> >>
>> >> As I said, with <![CDATA[ ... ]]> you only need to care about the
>> >>
>> >> exact sequence "]]>": if it's found within an input, get rid of it or
>> >>
>> >> somehow fix it (string replacement "]]>" => "]]>]]&gt;<![CDATA[" gets
>> >>
>> >> the job done safely). With </code> (or even with Arthur's <cdata>
>> >>
>> >> suggestion, to some degree), things are quite more complex:
>> >
>> > I don't understand. Any sanitizer has to escape < and &.
>> Not any: for content that will go inside <![CDATA[ ... ]]> there is no
>> need at all to care about < and &. That's the whole point of CDATA. In
>> other words, a < inside a CDATA block is exactly equivalent to a &lt;
>> outside of it: it will have no special meaning and just render as "<".
>> The same holds for & and &amp;, and also for > and &gt;.
>>
>> For content generated programatically, it's quite indifferent to use
>> CDATA or to escape stuff. For manually authored content, CDATA saves a
>> lot of authoring pain (I'm assuming this is the case Georg had in mind
>> when starting this thread). If used with user-provided content,
>> Georg's proposal would open up a potential for injection attacks that
>> require the spec, implementations, and server-side scripts to do a
>> good deal of non-trivial fool-proofing. CDATA addresses the use-case,
>> without so many nasty side effects.
>>
>> Regards,
>> Eduard Pascual
>>
>
>

Received on Wednesday, 7 April 2010 21:14:50 UTC