W3C home > Mailing lists > Public > public-html-comments@w3.org > April 2010

Re: HTML 5

From: T.J. Crowder <tj@crowdersoftware.com>
Date: Wed, 7 Apr 2010 21:45:42 +0100
Message-ID: <n2qc95470a1004071345sd848f22cj3734c18f7ffc3d3@mail.gmail.com>
To: Eduard Pascual <herenvardo@gmail.com>
Cc: gesteehr@googlemail.com, public-html-comments@w3.org
>
> For content generated programatically, it's quite indifferent to use

CDATA or to escape stuff.


No, there's a very large difference. Currently, if I have user-generated
content, I just escape it. All of it. Completely. With the same rules.
Whereas if we start saying "you escape this, but not this" or "you escape
this *one* way, and you escape this other thing a *different* way," you
dramatically complicate the rules any sanitizer must follow. Which
inevitably leads to sanitizers that fail, and successful injection attacks.

I'm all for making things easier for people, but in this case I'm thinking
the way to do that is to use a proper tool for authoring, not to make life a
lot more difficult for people dealing with user-generated content.

(I'm assuming this is the case Georg had in mind when starting this thread)


Until or unless Georg comes back to the thread, FWIW I don't think we should
assume anything on his behalf.
--
T.J. Crowder
Independent Software Consultant
tj / crowder software / com
www.crowdersoftware.com



On 7 April 2010 19:06, Eduard Pascual <herenvardo@gmail.com> wrote:

> On Wed, Apr 7, 2010 at 5:49 PM, T.J. Crowder <tj@crowdersoftware.com>
> wrote:
> >> > What makes ]]> easier to defend against than </code>?
> >>
> >> As I said, with <![CDATA[ ... ]]> you only need to care about the
> >>
> >> exact sequence "]]>": if it's found within an input, get rid of it or
> >>
> >> somehow fix it (string replacement "]]>" => "]]>]]&gt;<![CDATA[" gets
> >>
> >> the job done safely). With </code> (or even with Arthur's <cdata>
> >>
> >> suggestion, to some degree), things are quite more complex:
> >
> > I don't understand. Any sanitizer has to escape < and &.
> Not any: for content that will go inside <![CDATA[ ... ]]> there is no
> need at all to care about < and &. That's the whole point of CDATA. In
> other words, a < inside a CDATA block is exactly equivalent to a &lt;
> outside of it: it will have no special meaning and just render as "<".
> The same holds for & and &amp;, and also for > and &gt;.
>
> For content generated programatically, it's quite indifferent to use
> CDATA or to escape stuff. For manually authored content, CDATA saves a
> lot of authoring pain (I'm assuming this is the case Georg had in mind
> when starting this thread). If used with user-provided content,
> Georg's proposal would open up a potential for injection attacks that
> require the spec, implementations, and server-side scripts to do a
> good deal of non-trivial fool-proofing. CDATA addresses the use-case,
> without so many nasty side effects.
>
> Regards,
> Eduard Pascual
>
Received on Wednesday, 7 April 2010 20:46:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:14:02 GMT