- From: Michael A. Peters <mpeters@mac.com>
- Date: Wed, 07 Apr 2010 10:45:55 -0700
- To: Eduard Pascual <herenvardo@gmail.com>
- Cc: "T.J. Crowder" <tj@crowdersoftware.com>, gesteehr@googlemail.com, public-html-comments@w3.org
Eduard Pascual wrote: > (Note: I made a recurrent typo on my previous e-mails: XML's CDATA tag > is spelled <![CDATA[ ... ]]> rather than <[CDATA[ ... ]]>. The "<!" > sequence is a legacy from SGML's obscure features. My apologies if > those mistakes caused any issue; although I hope the idea behind my > posts was clear enough.) > > On Wed, Apr 7, 2010 at 7:49 AM, T.J. Crowder <tj@crowdersoftware.com> wrote: >>> <[CDATA[ ... ]]>. This is far easier to >>> >>> sanitize (you just need to ensure that the input doesn't include the >>> >>> "]]>" sequence), thus being more usable on user-provided content. >> What makes ]]> easier to defend against than </code>? > As I said, with <![CDATA[ ... ]]> you only need to care about the > exact sequence "]]>": if it's found within an input, get rid of it or > somehow fix it (string replacement "]]>" => "]]>]]><![CDATA[" gets > the job done safely). With </code> (or even with Arthur's <cdata> > suggestion, to some degree), things are quite more complex: > 1) an instance of the "</code>" string may be legitimate within the > content (if it closes a matching <code ...> within the content). > 2) due to HTML5's error-handling rules, something other than "</code>" > may end up closing the initial <code ...>, so a sanitizer would have > to implement the error-handling rules and play really smart to handle > those cases. I don't know the rules down to the detail, but IIRC > something like this: <div> <code> </div> would have the <code> element > implicitly closed just before the </div>. That's why I just use DOMDocument (libxml2) for all dynamically generated code. I don't have to worry about that kind of thing. User input where markup is allowed is sent through a filter first (html tidy in xml mode followed by HTML Purifier) that fixes it for xml sanity and then it is imported into a DOM of its own before the node is imported into the DOM that is served to the requesting client. Code injection is a non issue for me. It's a little slower, but you can cache it once it has been done that way making performance an issue only the first time it is assembled or modified.
Received on Wednesday, 7 April 2010 17:46:37 UTC