Re: In-browser sanitization vs. a “Safe Node” in the DOM

David:

I very much appreciate this suggestion. And it most certainly helps a
lot. What makes it a little less useful is the need to again pre-screen
the data that actually ends up within the safeDiv because .. as I
understand it " ... untrusted markup injected into the Safe Node markup
could prematurely close the Safe Node..."

Michaela



On 01/21/2016 04:52 PM, David Ross wrote:
> Mike West, Mario Heiderich, and myself had a conversation recently
> just beginning to explore what sanitization baked into the browser
> might look like.  Of course this is not a new concept, with Internet
> Explorer’s ".toStaticHTML()" method having come and gone.  Perhaps it
> was ahead of its time.  =)
> 
> I think we’re beginning to achieve consensus that browser-native
> client-side sanitization would be a boon for web apps.  Thinking about
> that got me wondering if maybe there’s a way to achieve the desired
> effect without actually implementing all the complexity of markup
> sanitization.  I started to imagine a "Safe Node" in the DOM tree that
> logically enforces various policies on the nodes beneath it.
> Unsurprisingly I’m not the first person to have thought of this, as
> Michael Zalewski had the same idea a number of years back.  Perhaps
> this is an idea whose time has finally come.
> 
> So here is a strawman proposal for a Safe Node in the DOM.  Please
> don’t focus on syntax, but rather take a look at the idea overall and
> try to identify any fatal flaws or areas for improvement.
> 
> 
> Example Safe Node Usage
> 
> var safeDiv = document.createElement("DIV");
> var safeAttribute = document.createAttribute("safety");
> safeAttribute.value = "Enabled: true; DownloadExternalContent: false; ...";
> safeDiv.setAttributeNode(safeAttribute);
> safeDiv.innerHTML = untrustedMarkup;
> document.body.appendChild(safeDiv);
> 
> Insert safeDiv into the DOM tree as shown and it will safely contain
> untrusted markup.
> 
> Policy enforcement configuration is set on an attribute of the Safe
> Node ("DownloadExternalContent" in the example).  If you check the
> markup of the previously created DIV like this:
> 
> outputMarkup = safeDiv.outerHTML;
> 
> ...at this point outputMarkup might look something like this:
> 
> <div safety="Enabled: true; DownloadExternalContent: false;
> ...">[untrusted markup]</div>
> 
> It is possible to integrate markup from various sources that will
> ultimately be rendered later.  Or in applications that aren’t as
> complex, it’s easy to simply output untrusted markup into Safe Nodes
> that are immediately added into the document.
> 
> Safety is enforced by the fact that the untrusted markup is contained
> within a Safe Node.  Breakout is prevented by the design pattern shown
> above.  (e.g.: Setting innerHTML will inherently never allow breaking
> out of the containing node.)
> 
> 
> Policies capable of being enforced
> 
> Policies would match those that a sanitizer would also be capable of
> enforcing to prevent content that may or may not be malicious.
> 
> Policies set to be enabled by default:
> 
> * Disablement of script / active content
> * Disablement of frames
> * No support for FORM elements (to prevent phishing)
>  + Input elements such as INPUT, BUTTON, etc. still allowed
> * Disablement of link targeting
> * Supported protocols limited to https://
> * Safe CSS
>  + Prevent anti-XSRF nonce theft via CSS
>  + Prevent UI overlay
>  + Prevent any identified abuse of existing styles on the page
>  + Prevent styles defined within the Safe Node from affecting the
> surrounding page
> 
> Optional policies:
> 
> * Max width / height
>  + To prevent outside UI from being pushed out of the way
> * Allow links
> * List of protocols to allow in URLs, beyond https://
> * Flag to regulate use of relative URLs
> * Flag to regulate use of multimedia (e.g.: AUDIO and VIDEO elements)
> * Flag to regulate use of external content
>  + Callback for handling external content
> 
> This list was derived from Michal Zalewski’s previous work and my own
> experience with implementation of client-side sanitization.
> 
> Right now the list above covers the policies that would make sense to
> regulate however it does not specify syntax.  When the syntax is
> ultimately defined, it would seem to make the most sense to adopt
> existing conventions if possible (e.g.: Maybe the FORM policy maps
> well to frame sandbox "allow-forms"?).
> 
> 
> Pros (relative to sanitization)
> 
> 1) Elimination of sanitization complexity.
> 
> It’s much easier to implement a policy such as "disable script under
> this node" than it is to implement sanitizer logic to optimally
> achieve the same result.  Since we are in the browser, it's possible
> to avoid creating a DOM and then walking through it as would be
> required for sanitization.
> 
> E.g.: How do you properly sanitize SVG?  This is difficult for a
> traditional client-side sanitizer to get right.  Answer: Mostly we
> don’t care, we just enforce that script is disabled below a given
> node, clipping is enforced, etc. as per configuration of the Safe
> Node.
> 
> 2) It's only natural for enforcement of policy to be integrated with
> the actual implementation of the code on which the policy is being
> enforced.
> 
> 3) It’s easier for a Safe Node to safely handle CSS than it would be
> for a client-side sanitizer.
> 
> A client-side sanitizer has no visibility into externally downloaded
> stylesheets.  (Though this may not be an issue with a sanitizer built
> into the browser, given that it could effectively regulate downloaded
> stylesheets.)
> 
> It’s also difficult for client-side sanitizers to correctly handle
> inline STYLE elements as there is no real DOM for a STYLE element.
> It’s not easy for a client-side sanitizer to effectively constrain
> unsafe styles to within a given element.
> 
> 
> Cons (relative to sanitization)
> 
> 1) Being able to get sanitized markup is a feature that could have
> non-niche use cases that have yet to be identified.
> 
> 
> FAQ
> 
> Q: How is this different from IFRAMES?  Seamless IFRAMEs?
> A: IFRAMEs are clumsy in that they contain a different document, CSS
> doesn’t apply, and they are rectangular.  Seamless IFRAMEs take care
> of two of those problems, but they seem to have been abandoned as a
> proposal for standardization.
> 
> Q: If you have some markup with a Safe Node in it, is that safe?
> A: Best practice: Always output unsafe markup into a Safe Node that
> you (the host) have created.  If you do need to manipulate markup
> containing a Safe Node and then output that markup directly onto the
> page, remember to treat the Safe Node string as an atomic unit.
> Untrusted markup injected into the Safe Node markup could prematurely
> close the Safe Node.
> 
> Q: Can you manipulate the DOM underneath a Safe Node?
> A: Sure!  If a SCRIPT node is created or moved to within the Safe
> Node, for example, it simply does not execute script.  Of course it
> would not be secure to pull nodes out from within a safe tree and move
> them elsewhere in the DOM, outside of a safe node.
> 
> Q: Why not implement a "safe innerHTML" instead of a Safe Node?
> A: The Safe Node paradigm makes it easy to store configuration
> parameters in an attribute on the element.  Also, apparently
> .innerHTML is not available in SVG.
> 
> 
> Todo
> 
> Is there a convenient and safe way to enable script from outside the
> Safe Node to set an event handler on markup existing within the safe
> node?  It should be possible to do this safely in some fashion.  We
> would certainly need to consider the possibility of DOM clobbering
> though and ensure best practice is immune to that.
> 
> Thoughts?  The key question: Is this proposal better or worse than
> more traditional client-side sanitization baked into a browser API?
> 
> 
> References
> 
> Michael Zalewski has previously proposed a very similar idea.
> 
> The list of policies for enforcement was inspired by Michael’s work on
> and also by my own work on the jSanity client-side sanitizer.
> 
> 
> Dave
> 

Received on Friday, 22 January 2016 08:33:24 UTC