W3C home > Mailing lists > Public > public-html@w3.org > January 2010

Re: <iframe doc="">

From: Lars Gunther <gunther@keryx.se>
Date: Mon, 25 Jan 2010 11:03:46 +0100
Message-ID: <4B5D6C82.6020402@keryx.se>
To: "public-html@w3.org WG" <public-html@w3.org>
2010-01-24 18:14, Tab Atkins Jr. skrev:
> Indeed, there are nearly as many html-sanitizers as there are CMSes.
> And they're pretty uniformly bad.  Most of them are built on fragile
> regexps, if you're lucky.  They might just be a handful of string
> replaces that address whatever problems the CMS author could think of
> at the time.  The best of them address *currently known attack
> vectors* decently enough, but are usually weak to*new*  attacks.

There are white list approaches as well that one can use and indeed that 
are being used. I know of and have written a few myself.

Using XHTML syntax and XML tools makes this stuff easier to implement, 
in the absence of a "full HTML parser/tokenizer"!

I am unconvinced about the usefulness on MOVING security to the browser. 
First of all it can not be relied on, since we do not know for sure that 
all user agents implement it correctly. And it will take many years 
until 99 % of all agents support this and in the meantime we have to 
continue to do server side checks anyway.

This thing could work if seen as an extra layer of security. Defence in 
depth is always a good thing! But if it is marketed as something you'll 
do INSTEAD of servers side checks, it will actually be harmful to 
security on the web.

Besides, you will probably want to stop a lot of other things as well, 
like target="_blank" and <div style="display: none">Lots of links I use 
for black hat SEO here</div> even if it is inside an iframe, sandboxed 
or not.

There is nothing stopping a CMS system from enforcing a more strict rule 
set about what markup is allowed in comments, i.e. you can stop the edge 
cases by disallowing them altogether.

User generated data is not about full HTML support, but a subset.

A combination of Tidy, XSLT, striptags and regexp can be used to filter 
the content really nice. Add escaping of output in the other end and you 
are as secure as you can be.

Summary: If this technology is about "offloading" security to the 
browser, it will be harmful to web security! If it is about adding an 
extra layer, and will be marketed only as such, it is OK.


-- 
Lars Gunther
http://keryx.se/
http://twitter.com/itpastorn/
http://itpastorn.blogspot.com/
Received on Monday, 25 January 2010 10:04:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:00 GMT