Re: "cleaning HTML for security" from Joshua Cranmer on 2014-11-10 (public-htmail@w3.org from November 2014)

From: Joshua Cranmer <Pidgeot18@verizon.net>
Date: Mon, 10 Nov 2014 12:21:42 -0600
To: public-htmail@w3.org
Message-id: <54610236.3010102@verizon.net>
On 11/10/2014 6:45 AM, chaals@yandex-team.ru wrote:
> One of the things they want to do before finishing it is describe how HTML gets cleaned up for security before pasting into a random page. This may or may not be similar to the things that are removed from mail when it is e.g. presented in Webmail for security reasons.
>
> I don't expect to get a copy of everyone's security policies in detail, but I think it would be useful to at least list common things that are "removed" for security purposes, along with some explanation of the reason.

HTML sanitization I would presume is usually implemented on a whitelist 
basis, particularly in email (which tends to be far more conservative).

> For example I presume that more or less everyone takes out javascript "eval" statements, because there is no way to automatically check that they will do no harm.

The client I work on (Thunderbird) disabled even the ability to enable 
JavaScript several years ago when we stopped trusting the sandboxing of 
JS execution [1]. I am unaware of any other client that ever attempted 
to support JavaScript in email in the first place (which is why we 
dropped support instead of trying to fix sandboxing or even let the user 
shoot themselves in the foot).

In general, JavaScript cannot be statically sanitized with any degree of 
precision. I can think of at least three distinct ways to get something 
akin to eval, and the ability to access x.foo via x['foo'] renders 
precision equivalent to the halting problem. Not to mention the ways in 
which you can dynamically inject more JavaScript, which makes static 
sanitization without dynamic sandboxing treacherous.

> Would it be good to have a page to collect this in our wiki, or are people prepared to send at least some of the stuff to the mailing list (and a volunteer - I see one in the mirror - could start to gather them in a wiki)?

I will note that Thunderbird primarily relies on sandboxing rather than 
sanitization. So features like SVG, MathML, even <audio> and <video> 
already work with no extra effort on our part! The sandbox 
unconditionally disables JavaScript and plugin execution; forms won't 
submit (but we erroneously render them); remote content loads (e.g., 
images, videos) are disabled by default, but the user can enable them on 
a per-message or per-sender basis. We do have an option to sanitize HTML 
prior to display for paranoia purposes (and an option to enable that 
sanitization for spam messages, but I don't know if it's enabled by 
default).

[1] Coarse-grained details: all JS access to the DOM used to go through 
a single dispatch point, where generic sandboxing policies could be 
easily applied. Since the single dispatch was slow as crap, the DOM 
accesses were rerouted, and the rerouting opted to not support a generic 
sandbox policy.

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Received on Monday, 10 November 2014 18:22:26 UTC