Re: In-browser sanitization first, "Safe Node" later?

> Take the <a> as an example, imagine a forum with a WYSIWYG (/me shudders)... some forums won't like this at all (SEO spamming), some may consider this safe if it has a rel=nofollow... but many will forget the href="javascript:...", which is a valid attribute on a valid node, but getting a click event can cause inline JavaScript to run (assuming no CSP that blocks unsafe-inline).
> If you know how to solve this (both as a Sanitiser or under a Safe Node), then I'll be very happy.

Two things here:
1)  Does the sanitizer allow anchors (and equivalent constructs) through?
  This tends to be something that is configurable in the sanitizer.
Good sanitizers need various configuration options to cover all the
standard use cases.
2)  Does the sanitizer adequately remove script, even script that
requires a click to fire?
  This tends to be something that good sanitizers handle well by
default, keeping javascript URLs from passing sanitization.


> I also prefer The Simplest Thing That Could Possibly Work. To me that would seem to be the string in/string out interface, or a string in/tree of DOM Nodes interface (then the caller could do something like: e.appendChild(purify(bad_string)) ). Or both.

I think the complexity involved in writing a sanitizer is >= the
complexity of Safe Node.  The advantage of Safe Node is that the logic
is integrated and consistent with the DOM itself.  For example,
here's some code from the jSanity sanitizer that ensures URLs
referenced in CSS are identified and handled by a user-provided
callback*:

output = child.style.getPropertyValue(childStyle);

if (output.substring(0, 4) === "url(") {
  // Deal with external content
  if (itemOptions.externalContentCallback !== null) {
    output = itemOptions.externalContentCallback("CSSURL", childStyle,
output, knownProtocols);
    modifiedProperty = true;
  }
}

You'll also see the same sort of logic when jSanity handles SRC
attributes on IMG tags and the like.  This logic could be centralized
in a Safe Node implementation, and without the sort of layering
violation you see above where external content logic is dependent on
knowledge of CSS syntax.  And there is a greater chance that that the
Safe Node implementation would be automatically secure given the
introduction of a completely new external content source introduced as
a feature.

IMO, if we bake sanitization into the browser it might ultimately just
be an exercise in moving code around.  Safe Node attempts to maximize
the benefit we might get out of the browser integration.

* - By providing a callback for external content, the sanitizer can
handle the HTML e-mail use case where automatic external content
download is a privacy issue.

Dave


On Mon, Feb 8, 2016 at 11:35 PM, Frederik Braun <fbraun@mozilla.com> wrote:
>
> On 08.02.2016 23:03, Craig Francis wrote:
> > I've not had a proper look at this yet (reading on a mobile phone), but by sheer coincidence a very similar discussion/presentation covers this as well:
> …
>
> No coincidence, Mario Heiderich's presentation was mentioned in my
> original post ;-)
>

Received on Thursday, 11 February 2016 19:38:16 UTC