- From: David Ross <drx@google.com>
- Date: Thu, 21 Jan 2016 14:52:37 -0800
- To: "public-webappsec@w3.org" <public-webappsec@w3.org>
Mike West, Mario Heiderich, and myself had a conversation recently just beginning to explore what sanitization baked into the browser might look like. Of course this is not a new concept, with Internet Explorer’s ".toStaticHTML()" method having come and gone. Perhaps it was ahead of its time. =) I think we’re beginning to achieve consensus that browser-native client-side sanitization would be a boon for web apps. Thinking about that got me wondering if maybe there’s a way to achieve the desired effect without actually implementing all the complexity of markup sanitization. I started to imagine a "Safe Node" in the DOM tree that logically enforces various policies on the nodes beneath it. Unsurprisingly I’m not the first person to have thought of this, as Michael Zalewski had the same idea a number of years back. Perhaps this is an idea whose time has finally come. So here is a strawman proposal for a Safe Node in the DOM. Please don’t focus on syntax, but rather take a look at the idea overall and try to identify any fatal flaws or areas for improvement. Example Safe Node Usage var safeDiv = document.createElement("DIV"); var safeAttribute = document.createAttribute("safety"); safeAttribute.value = "Enabled: true; DownloadExternalContent: false; ..."; safeDiv.setAttributeNode(safeAttribute); safeDiv.innerHTML = untrustedMarkup; document.body.appendChild(safeDiv); Insert safeDiv into the DOM tree as shown and it will safely contain untrusted markup. Policy enforcement configuration is set on an attribute of the Safe Node ("DownloadExternalContent" in the example). If you check the markup of the previously created DIV like this: outputMarkup = safeDiv.outerHTML; ...at this point outputMarkup might look something like this: <div safety="Enabled: true; DownloadExternalContent: false; ...">[untrusted markup]</div> It is possible to integrate markup from various sources that will ultimately be rendered later. Or in applications that aren’t as complex, it’s easy to simply output untrusted markup into Safe Nodes that are immediately added into the document. Safety is enforced by the fact that the untrusted markup is contained within a Safe Node. Breakout is prevented by the design pattern shown above. (e.g.: Setting innerHTML will inherently never allow breaking out of the containing node.) Policies capable of being enforced Policies would match those that a sanitizer would also be capable of enforcing to prevent content that may or may not be malicious. Policies set to be enabled by default: * Disablement of script / active content * Disablement of frames * No support for FORM elements (to prevent phishing) + Input elements such as INPUT, BUTTON, etc. still allowed * Disablement of link targeting * Supported protocols limited to https:// * Safe CSS + Prevent anti-XSRF nonce theft via CSS + Prevent UI overlay + Prevent any identified abuse of existing styles on the page + Prevent styles defined within the Safe Node from affecting the surrounding page Optional policies: * Max width / height + To prevent outside UI from being pushed out of the way * Allow links * List of protocols to allow in URLs, beyond https:// * Flag to regulate use of relative URLs * Flag to regulate use of multimedia (e.g.: AUDIO and VIDEO elements) * Flag to regulate use of external content + Callback for handling external content This list was derived from Michal Zalewski’s previous work and my own experience with implementation of client-side sanitization. Right now the list above covers the policies that would make sense to regulate however it does not specify syntax. When the syntax is ultimately defined, it would seem to make the most sense to adopt existing conventions if possible (e.g.: Maybe the FORM policy maps well to frame sandbox "allow-forms"?). Pros (relative to sanitization) 1) Elimination of sanitization complexity. It’s much easier to implement a policy such as "disable script under this node" than it is to implement sanitizer logic to optimally achieve the same result. Since we are in the browser, it's possible to avoid creating a DOM and then walking through it as would be required for sanitization. E.g.: How do you properly sanitize SVG? This is difficult for a traditional client-side sanitizer to get right. Answer: Mostly we don’t care, we just enforce that script is disabled below a given node, clipping is enforced, etc. as per configuration of the Safe Node. 2) It's only natural for enforcement of policy to be integrated with the actual implementation of the code on which the policy is being enforced. 3) It’s easier for a Safe Node to safely handle CSS than it would be for a client-side sanitizer. A client-side sanitizer has no visibility into externally downloaded stylesheets. (Though this may not be an issue with a sanitizer built into the browser, given that it could effectively regulate downloaded stylesheets.) It’s also difficult for client-side sanitizers to correctly handle inline STYLE elements as there is no real DOM for a STYLE element. It’s not easy for a client-side sanitizer to effectively constrain unsafe styles to within a given element. Cons (relative to sanitization) 1) Being able to get sanitized markup is a feature that could have non-niche use cases that have yet to be identified. FAQ Q: How is this different from IFRAMES? Seamless IFRAMEs? A: IFRAMEs are clumsy in that they contain a different document, CSS doesn’t apply, and they are rectangular. Seamless IFRAMEs take care of two of those problems, but they seem to have been abandoned as a proposal for standardization. Q: If you have some markup with a Safe Node in it, is that safe? A: Best practice: Always output unsafe markup into a Safe Node that you (the host) have created. If you do need to manipulate markup containing a Safe Node and then output that markup directly onto the page, remember to treat the Safe Node string as an atomic unit. Untrusted markup injected into the Safe Node markup could prematurely close the Safe Node. Q: Can you manipulate the DOM underneath a Safe Node? A: Sure! If a SCRIPT node is created or moved to within the Safe Node, for example, it simply does not execute script. Of course it would not be secure to pull nodes out from within a safe tree and move them elsewhere in the DOM, outside of a safe node. Q: Why not implement a "safe innerHTML" instead of a Safe Node? A: The Safe Node paradigm makes it easy to store configuration parameters in an attribute on the element. Also, apparently .innerHTML is not available in SVG. Todo Is there a convenient and safe way to enable script from outside the Safe Node to set an event handler on markup existing within the safe node? It should be possible to do this safely in some fashion. We would certainly need to consider the possibility of DOM clobbering though and ensure best practice is immune to that. Thoughts? The key question: Is this proposal better or worse than more traditional client-side sanitization baked into a browser API? References Michael Zalewski has previously proposed a very similar idea. The list of policies for enforcement was inspired by Michael’s work on and also by my own work on the jSanity client-side sanitizer. Dave
Received on Thursday, 21 January 2016 22:53:28 UTC