Re: <iframe doc=""> from Shelley Powers on 2010-01-25 (public-html@w3.org from January 2010)

From: Shelley Powers <shelley.just@gmail.com>
Date: Mon, 25 Jan 2010 08:34:07 -0600
To: Maciej Stachowiak <mjs@apple.com>
Cc: Lars Gunther <gunther@keryx.se>, "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <643cc0271001250634i366c8d11p5af2df80b4c68d20@mail.gmail.com>
On Mon, Jan 25, 2010 at 4:16 AM, Maciej Stachowiak <mjs@apple.com> wrote:
>
> On Jan 25, 2010, at 2:03 AM, Lars Gunther wrote:
>
>> 2010-01-24 18:14, Tab Atkins Jr. skrev:
>>> Indeed, there are nearly as many html-sanitizers as there are CMSes.
>>> And they're pretty uniformly bad.  Most of them are built on fragile
>>> regexps, if you're lucky.  They might just be a handful of string
>>> replaces that address whatever problems the CMS author could think of
>>> at the time.  The best of them address *currently known attack
>>> vectors* decently enough, but are usually weak to*new*  attacks.
>>
>> There are white list approaches as well that one can use and indeed that
are being used. I know of and have written a few myself.
>>
>> Using XHTML syntax and XML tools makes this stuff easier to implement, in
the absence of a "full HTML parser/tokenizer"!
>>
>> I am unconvinced about the usefulness on MOVING security to the browser.
First of all it can not be relied on, since we do not know for sure that all
user agents implement it correctly. And it will take many years until 99 %
of all agents support this and in the meantime we have to continue to do
server side checks anyway.
>>
>> This thing could work if seen as an extra layer of security. Defence in
depth is always a good thing! But if it is marketed as something you'll do
INSTEAD of servers side checks, it will actually be harmful to security on
the web.
>
> The goal for sandboxed <iframe> is to promote and deploy it for defense in
depth.  It is not intended to be used as the sole security mechanism, since
it will take years until browsers that do not support it are gone.
>
>> Besides, you will probably want to stop a lot of other things as well,
like target="_blank" and <div style="display: none">Lots of links I use for
black hat SEO here</div> even if it is inside an iframe, sandboxed or not.
>
> Sandboxed iframes will help you with targeted links. Check out the
"sandboxed navigation browsing context flag" here: <
http://dev.w3.org/html5/spec/Overview.html#attr-iframe-sandbox>. It will not
help you strip out "display: none" content, but search engines will be able
to make their own judgment based on the fact that it is sandboxed.
>
>> Summary: If this technology is about "offloading" security to the
browser, it will be harmful to web security! If it is about adding an extra
layer, and will be marketed only as such, it is OK.
>
> My understanding is that the purposes is solely defense in depth, at least
until it is widely enough deployed that it can be relied on. Even then, it's
probably best to combine it with a whitelist filter. Certainly the security
experts I've talked to would promote its use as an additional mechanism, and
if anyone asked me for advice on deploying sandboxed iframes, I would tell
them the same.
>
> (To clarify relation to the thread topic, this doesn't necessarily depend
on doc/srcdoc, just on sandboxed iframes in general. As far as I'm aware,
srcdoc is just there to make sandboxed iframes easier to use, it is not
required to get the security benefits.)
>
> Regards,
> Maciej
>
>
>


I'm glad you noted that this approach won't really be worthwhile, or stand
alone, for years because of existing browsers.

But let's go further on this. Since I found out that the primary use case
for this change, and in fact, this whole sandbox issue, is comments, let's
talk about comments.

I've had a weblog from various vendors for close to a decade now. Started
out with one of Dave Winer's defunct hosting systems, went to Blogger, went
to Radio, back to Blogger, to Wordpress, to my own tool at one point, tried
out ExpressionEngine, couple of others I can't even remember, and now,
Drupal. And I've had comments, off and on, since my early Blogger days. So,
I know comments.

This is an approach to protect against things like, I believe people doing
things like inserting JavaScript into their comments? Does it also protect
against SQL injection, which has been the primary problem in the past? I'm
not sure how throwing all of this on the browsers is going to protect
against SQL injection,but I'll make an assumption that yes, it protects
again SQL injection.

So, let's talk about SQL injection let's talk about...viagra.

Yes, viagra. The reason I want to talk about viagra is because the primary
problem most people have had with things like comments in the last five
years or so, is less about XSS or SQL injection, because we got most of this
problem a while back. I mean, the people that provide the most popular
comment protection systems live and breath this stuff, and are probably the
most expert people in the world on comment security, so most of us aren't
overly concerned about the script kiddies. Well, no more than most, since
any tool that generates any comment is vulnerable to hackers -- including
browsers. No the problems people have had in the last several years is
spammers. Spammers, coming in with their links to viagra, or
spammerisgood.com.

So, will the browsers also protect us from spammers? I read the IRC, I get
the impression from this "fix", that you all don't think highly of the tools
we're using now, and that we should trust the browsers to provide these
protections in the future. But the input protection tools not only protect
us against inserted JS or SQL injections, they also protect us from
spammers. When we turn over management of security of comments to the
browser companies, will they/you also build in support against spammers?

So, can we can stop using Akismet?

We've talked about spammers and viagra, let's talk about something else.
Such as whitelisted HTML entities.

Right now, most of us only allow certain elements in our comments. We don't
allow script, of course, but many of us don't like inserting img elements
either.

You remember goatse? Many of us stopped allowing img insertion when people
started inserting in goatse images, or their like. Or images too big for the
comment area, or inappropriate, or any number of things.So, when we turn our
comments over to the browser companies, will they also provide the ability
to whitelist which elements we'll allow or not?

Now, you mentioned allowing SQL in HTML. Of course, I'm assuming you won't
allow script in the HTML. Or script in the HTML that is added as a foreign
object in the SVG. But you don't necessarily need script to cause problems
with something like SVG.

SVG has declarative animations. If they can't harm a site, they can cause
problems. Flash enough color fast enough, and you'll throw some folk into
seizures. In addition, I have an SVG file that can pretty much take any
browser and computer system down to a slow crawl because it has so many
complex paths in it. I have another, much simpler one that actually not only
crashed the first version of Chrome when it came out, it crashed my
operating system because Chrome handled the rendering so poorly.

(No worries, Chrome fixed that particular problem quite quickly.)

I'm a huge supporter of SVG, but I never allowed SVG elements in my
comments. And my comment protection software allowed me to pick and choose
who received what, all based on their user setting. Because my software,
Drupal, allows me to create any number of user types, and then assign
different protection levels accordingly.

So, we can count on browsers providing this level of support?

Speaking of which, I want to talk about Blogher, the site. That's
http://blogher.com. Blogher is a Drupal site, where anyone can sign up for
an account. Not only can you comment, but you can also write your own posts.


Now, the posts won't show up on the front page, not until you get enough
people following you. But you can, more or less, post on your page right
from the start. Really helped Blogher to build community.

Of course, this required another level of input control, but this time on
the weblog posts, not the comments. But that's OK, because the same software
that scrubs the input for comments can also be used with the input for
weblog posts, even though the template that serves the posts is different
than the template that serves comments. Again, we can set different levels
of security on different levels of weblog editors, as well as commenters.

Will the browsers provide this level of support? How will we handle
templates, then, if the template that serves up weblog posts for trusted
users is the same template that serves up posts for not-yet-trusted users?
Does this mean that we have to convert all of the weblog posts to iframes
with escaped X/HTML in attributes?

Speaking of escaping...I serve my pages up as XHTML, that's with
application/xhtml+xml. In fact, it's the default for all my pages at my
sites. Yeah, it's a bit of work, and glitches come through, but it's a sure
fire way of ensuring my pages are pretty clean. But it makes it difficult to
serve comments some times--especially when you have readers like Sam Ruby
and Jacques Distler. As soon as I would open comments and proclaim them
safe, typically one or the other would come along and type in a character
such as U+FFFE.

This is OK, if you're serving your pages up as HTML, but will cause the YSOD
if you serve your pages up as XHTML. This was particularly tricky, too,
because for a time, CMS weren't particularly concerned about protecting your
comments in an XHTML environment.

When I switched to Drupal, though, the issue became less of a problem
because Drupal folks seem to have a stronger interest in pages being proper
XHTML. They may serve the pages as HTML, but they still want them to be
proper XHTML. I had a choice of several modules to use to protect my
comments--not only against script kiddies and SQL injection, spammers, and
elements that can be abused, but also against these non-printing characters
that play havoc in XHTML pages.

I use htmLawed, and the one time stuff did get through that threw up a
YSOD,the actual creator of the module contacted me to go over the
particulars so he could ensure this wouldn't happen again. The folks that
provide these plug-ins and modules, you may not like the code, but this is
their heart and soul -- this is all they do, is discover ways that comments
and such can be broken and to work around the problems.

Will the browser companies be as responsive? So I have to escape my markup
in the attribute because my pages are XHTML, but will the browsers take care
of the characters like U+FFFE? They haven't before this time. Seems to me
that Firefox is happy to throw up a YSOD when it encounters this type of
character. I don't have Firefox developers contacting me, asking how they
can prevent such from happening.

The point I'm trying to laboriously make is that by the time I have to apply
all of the filters that I'm pretty sure the browser companies won't be
concerned about, the point of using something like an iframe, and escaped
markup in an attribute not only won't add any additional security, it may
just open up new security problems we're not aware of, because we're now
changing security handling from these experienced tools, to less experienced
browser developers.

Yes, less experienced. Browser developers have a different focus then
security for comments. There are other areas of security these developers
have to focus on, such as the security in the browsers own code -- such as
what was used with the recent China/Google events. But the security of the
page content has always been our, we the authors, the tool makers, the CMS
developers, concern. And we've managed over the years. We've learned. And
though you may not like the code for htmLawed, the thing works in all of the
ways we need it to work.

So who is the customer for this change? I was surprised when I saw that the
use case for this change was primarily weblog comments. My first reason was,
who asked for this? I don't know about others who use tools such as Drupal,
but I'm certainly not going to code my templates to use iframes with escaped
attribute text, rather than elements. I haven't a clue what this will do to
search engines, much less the other bots that crawl my site, since I also
serve up RDFa, and there are bots only interested in that (which they
probably won't discover since everything is escaped attribute text now).

As for trusting the browser companies, well no offense, but you all do put
out a lot of security releases. And I'm glad, but I would think that the
browser companies have enough to worry about with their own code, much less
now having to ensure the security of my page contents.

Naturally, if I can't convince you to remove this functionality from the
spec, I will file a bug. But in the meantime, I really would like to know
how you see all this working in today's systems, such like the use cases I
just mentioned? Since the point seems to be replacing tools like Akismet and
htmLawed. And I really would like to know: who is the customer for this
change?

Shelley

(Long email, probably many typos, sorry. Speaking of which, this is being
typed into GMail -- will GMail use iframes with escaped content in
attributes?)
Received on Monday, 25 January 2010 14:34:46 UTC