In-browser sanitization vs. a “Safe Node” in the DOM from David Ross on 2016-01-21 (public-webappsec@w3.org from January 2016)

From: David Ross <drx@google.com>
Date: Thu, 21 Jan 2016 14:52:37 -0800
To: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAMM+ux6pTnd+R5OFoQfzfUoH5LozRmfgZPmfY0yga1-pjKZpDg@mail.gmail.com>
Mike West, Mario Heiderich, and myself had a conversation recently
just beginning to explore what sanitization baked into the browser
might look like.  Of course this is not a new concept, with Internet
Explorer’s ".toStaticHTML()" method having come and gone.  Perhaps it
was ahead of its time.  =)

I think we’re beginning to achieve consensus that browser-native
client-side sanitization would be a boon for web apps.  Thinking about
that got me wondering if maybe there’s a way to achieve the desired
effect without actually implementing all the complexity of markup
sanitization.  I started to imagine a "Safe Node" in the DOM tree that
logically enforces various policies on the nodes beneath it.
Unsurprisingly I’m not the first person to have thought of this, as
Michael Zalewski had the same idea a number of years back.  Perhaps
this is an idea whose time has finally come.

So here is a strawman proposal for a Safe Node in the DOM.  Please
don’t focus on syntax, but rather take a look at the idea overall and
try to identify any fatal flaws or areas for improvement.


Example Safe Node Usage

var safeDiv = document.createElement("DIV");
var safeAttribute = document.createAttribute("safety");
safeAttribute.value = "Enabled: true; DownloadExternalContent: false; ...";
safeDiv.setAttributeNode(safeAttribute);
safeDiv.innerHTML = untrustedMarkup;
document.body.appendChild(safeDiv);

Insert safeDiv into the DOM tree as shown and it will safely contain
untrusted markup.

Policy enforcement configuration is set on an attribute of the Safe
Node ("DownloadExternalContent" in the example).  If you check the
markup of the previously created DIV like this:

outputMarkup = safeDiv.outerHTML;

...at this point outputMarkup might look something like this:

<div safety="Enabled: true; DownloadExternalContent: false;
...">[untrusted markup]</div>

It is possible to integrate markup from various sources that will
ultimately be rendered later.  Or in applications that aren’t as
complex, it’s easy to simply output untrusted markup into Safe Nodes
that are immediately added into the document.

Safety is enforced by the fact that the untrusted markup is contained
within a Safe Node.  Breakout is prevented by the design pattern shown
above.  (e.g.: Setting innerHTML will inherently never allow breaking
out of the containing node.)


Policies capable of being enforced

Policies would match those that a sanitizer would also be capable of
enforcing to prevent content that may or may not be malicious.

Policies set to be enabled by default:

* Disablement of script / active content
* Disablement of frames
* No support for FORM elements (to prevent phishing)
 + Input elements such as INPUT, BUTTON, etc. still allowed
* Disablement of link targeting
* Supported protocols limited to https://
* Safe CSS
 + Prevent anti-XSRF nonce theft via CSS
 + Prevent UI overlay
 + Prevent any identified abuse of existing styles on the page
 + Prevent styles defined within the Safe Node from affecting the
surrounding page

Optional policies:

* Max width / height
 + To prevent outside UI from being pushed out of the way
* Allow links
* List of protocols to allow in URLs, beyond https://
* Flag to regulate use of relative URLs
* Flag to regulate use of multimedia (e.g.: AUDIO and VIDEO elements)
* Flag to regulate use of external content
 + Callback for handling external content

This list was derived from Michal Zalewski’s previous work and my own
experience with implementation of client-side sanitization.

Right now the list above covers the policies that would make sense to
regulate however it does not specify syntax.  When the syntax is
ultimately defined, it would seem to make the most sense to adopt
existing conventions if possible (e.g.: Maybe the FORM policy maps
well to frame sandbox "allow-forms"?).


Pros (relative to sanitization)

1) Elimination of sanitization complexity.

It’s much easier to implement a policy such as "disable script under
this node" than it is to implement sanitizer logic to optimally
achieve the same result.  Since we are in the browser, it's possible
to avoid creating a DOM and then walking through it as would be
required for sanitization.

E.g.: How do you properly sanitize SVG?  This is difficult for a
traditional client-side sanitizer to get right.  Answer: Mostly we
don’t care, we just enforce that script is disabled below a given
node, clipping is enforced, etc. as per configuration of the Safe
Node.

2) It's only natural for enforcement of policy to be integrated with
the actual implementation of the code on which the policy is being
enforced.

3) It’s easier for a Safe Node to safely handle CSS than it would be
for a client-side sanitizer.

A client-side sanitizer has no visibility into externally downloaded
stylesheets.  (Though this may not be an issue with a sanitizer built
into the browser, given that it could effectively regulate downloaded
stylesheets.)

It’s also difficult for client-side sanitizers to correctly handle
inline STYLE elements as there is no real DOM for a STYLE element.
It’s not easy for a client-side sanitizer to effectively constrain
unsafe styles to within a given element.


Cons (relative to sanitization)

1) Being able to get sanitized markup is a feature that could have
non-niche use cases that have yet to be identified.


FAQ

Q: How is this different from IFRAMES?  Seamless IFRAMEs?
A: IFRAMEs are clumsy in that they contain a different document, CSS
doesn’t apply, and they are rectangular.  Seamless IFRAMEs take care
of two of those problems, but they seem to have been abandoned as a
proposal for standardization.

Q: If you have some markup with a Safe Node in it, is that safe?
A: Best practice: Always output unsafe markup into a Safe Node that
you (the host) have created.  If you do need to manipulate markup
containing a Safe Node and then output that markup directly onto the
page, remember to treat the Safe Node string as an atomic unit.
Untrusted markup injected into the Safe Node markup could prematurely
close the Safe Node.

Q: Can you manipulate the DOM underneath a Safe Node?
A: Sure!  If a SCRIPT node is created or moved to within the Safe
Node, for example, it simply does not execute script.  Of course it
would not be secure to pull nodes out from within a safe tree and move
them elsewhere in the DOM, outside of a safe node.

Q: Why not implement a "safe innerHTML" instead of a Safe Node?
A: The Safe Node paradigm makes it easy to store configuration
parameters in an attribute on the element.  Also, apparently
.innerHTML is not available in SVG.


Todo

Is there a convenient and safe way to enable script from outside the
Safe Node to set an event handler on markup existing within the safe
node?  It should be possible to do this safely in some fashion.  We
would certainly need to consider the possibility of DOM clobbering
though and ensure best practice is immune to that.

Thoughts?  The key question: Is this proposal better or worse than
more traditional client-side sanitization baked into a browser API?


References

Michael Zalewski has previously proposed a very similar idea.

The list of policies for enforcement was inspired by Michael’s work on
and also by my own work on the jSanity client-side sanitizer.


Dave
Received on Thursday, 21 January 2016 22:53:28 UTC