Re: In-browser sanitization vs. a “Safe Node” in the DOM from David Ross on 2016-01-21 (public-webappsec@w3.org from January 2016)

From: David Ross <drx@google.com>
Date: Thu, 21 Jan 2016 15:22:37 -0800
To: Conrad Irwin <conrad.irwin@gmail.com>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAMM+ux5omjxB8c70RjhokdY2qCE6-StEDsTm=sWMdd4fy_fVLA@mail.gmail.com>
Thanks!

Regarding re-use of sandbox / CSP syntax -- absolutely, we should not
re-invent the wheel here!  I've also heard the same feedback from
others.

> a specific tag that implements this would be awesome
Just to clarify -- I'm not suggesting a new HTML element, but rather
an attribute that can be present on any element.  If this turns out to
be problematic we could consider limiting the "safety" attribute to
working only on DIVs, or just a handful of specific known-good
elements.

Dave



On Thu, Jan 21, 2016 at 3:14 PM, Conrad Irwin <conrad.irwin@gmail.com> wrote:
> I like this idea a lot, right now I use https://github.com/cure53/DOMPurify
> for sanitization ahead of time and then an iframe for CSS isolation (and a
> content-security-policy for extra help). It has exactly the problem you
> discuss of not fixing URLs in stylesheets.
>
> On a general note: I'd like to see re-use/extension of the existing
> "sandbox" and "content-security-policy" work, instead of a new set of
> orthogonal differently named restrictions.
>
> I think the right way of doing something like this is an iframe (sad to hear
> seamless iframes are going nowhere), but in the absense of that a specific
> tag that implements this would be awesome. (Another advantage of trying to
> re-purpose iframes to make the most of this is then chromium's work on
> OOPIFs would allow even better isolation of untrusted content.)
>
> Conrad
>
>
> Sent via Superhuman
>
> On Thu, Jan 21, 2016 at 2:55 PM, David Ross<drx@google.com>wrote:
>>
>> Mike West, Mario Heiderich, and myself had a conversation recently just
>> beginning to explore what sanitization baked into the browser might look
>> like. Of course this is not a new concept, with Internet Explorer’s
>> ".toStaticHTML()" method having come and gone. Perhaps it was ahead of its
>> time. =)
>>
>> I think we’re beginning to achieve consensus that browser-native
>> client-side sanitization would be a boon for web apps. Thinking about that
>> got me wondering if maybe there’s a way to achieve the desired effect
>> without actually implementing all the complexity of markup sanitization. I
>> started to imagine a "Safe Node" in the DOM tree that logically enforces
>> various policies on the nodes beneath it. Unsurprisingly I’m not the first
>> person to have thought of this, as Michael Zalewski had the same idea a
>> number of years back. Perhaps this is an idea whose time has finally come.
>>
>> So here is a strawman proposal for a Safe Node in the DOM. Please don’t
>> focus on syntax, but rather take a look at the idea overall and try to
>> identify any fatal flaws or areas for improvement.
>>
>> Example Safe Node Usage
>>
>> var safeDiv = document.createElement("DIV");
>> var safeAttribute = document.createAttribute("safety");
>> safeAttribute.value = "Enabled: true; DownloadExternalContent: false; ...";
>> safeDiv.setAttributeNode(safeAttribute);
>> safeDiv.innerHTML = untrustedMarkup;
>> document.body.appendChild(safeDiv);
>>
>> Insert safeDiv into the DOM tree as shown and it will safely contain
>> untrusted markup.
>>
>> Policy enforcement configuration is set on an attribute of the Safe Node
>> ("DownloadExternalContent" in the example). If you check the markup of the
>> previously created DIV like this:
>>
>> outputMarkup = safeDiv.outerHTML;
>>
>> ...at this point outputMarkup might look something like this:
>>
>> <div safety="Enabled: true; DownloadExternalContent: false;
>> ...">[untrusted markup]</div>
>>
>> It is possible to integrate markup from various sources that will
>> ultimately be rendered later. Or in applications that aren’t as complex,
>> it’s easy to simply output untrusted markup into Safe Nodes that are
>> immediately added into the document.
>>
>> Safety is enforced by the fact that the untrusted markup is contained
>> within a Safe Node. Breakout is prevented by the design pattern shown above.
>> (e.g.: Setting innerHTML will inherently never allow breaking out of the
>> containing node.)
>>
>> Policies capable of being enforced
>>
>> Policies would match those that a sanitizer would also be capable of
>> enforcing to prevent content that may or may not be malicious.
>>
>> Policies set to be enabled by default:
>>
>> * Disablement of script / active content
>> * Disablement of frames
>> * No support for FORM elements (to prevent phishing)
>> + Input elements such as INPUT, BUTTON, etc. still allowed
>> * Disablement of link targeting
>> * Supported protocols limited to https://
>> * Safe CSS
>> + Prevent anti-XSRF nonce theft via CSS
>> + Prevent UI overlay
>> + Prevent any identified abuse of existing styles on the page
>> + Prevent styles defined within the Safe Node from affecting the
>> surrounding page
>>
>> Optional policies:
>>
>> * Max width / height
>> + To prevent outside UI from being pushed out of the way
>> * Allow links
>> * List of protocols to allow in URLs, beyond https://
>> * Flag to regulate use of relative URLs
>> * Flag to regulate use of multimedia (e.g.: AUDIO and VIDEO elements)
>> * Flag to regulate use of external content
>> + Callback for handling external content
>>
>> This list was derived from Michal Zalewski’s previous work and my own
>> experience with implementation of client-side sanitization.
>>
>> Right now the list above covers the policies that would make sense to
>> regulate however it does not specify syntax. When the syntax is ultimately
>> defined, it would seem to make the most sense to adopt existing conventions
>> if possible (e.g.: Maybe the FORM policy maps well to frame sandbox
>> "allow-forms"?).
>>
>> Pros (relative to sanitization)
>>
>> 1) Elimination of sanitization complexity.
>>
>> It’s much easier to implement a policy such as "disable script under this
>> node" than it is to implement sanitizer logic to optimally achieve the same
>> result. Since we are in the browser, it's possible to avoid creating a DOM
>> and then walking through it as would be required for sanitization.
>>
>> E.g.: How do you properly sanitize SVG? This is difficult for a
>> traditional client-side sanitizer to get right. Answer: Mostly we don’t
>> care, we just enforce that script is disabled below a given node, clipping
>> is enforced, etc. as per configuration of the Safe Node.
>>
>> 2) It's only natural for enforcement of policy to be integrated with the
>> actual implementation of the code on which the policy is being enforced.
>>
>> 3) It’s easier for a Safe Node to safely handle CSS than it would be for a
>> client-side sanitizer.
>>
>> A client-side sanitizer has no visibility into externally downloaded
>> stylesheets. (Though this may not be an issue with a sanitizer built into
>> the browser, given that it could effectively regulate downloaded
>> stylesheets.)
>>
>> It’s also difficult for client-side sanitizers to correctly handle inline
>> STYLE elements as there is no real DOM for a STYLE element. It’s not easy
>> for a client-side sanitizer to effectively constrain unsafe styles to within
>> a given element.
>>
>> Cons (relative to sanitization)
>>
>> 1) Being able to get sanitized markup is a feature that could have
>> non-niche use cases that have yet to be identified.
>>
>> FAQ
>>
>> Q: How is this different from IFRAMES? Seamless IFRAMEs? A: IFRAMEs are
>> clumsy in that they contain a different document, CSS doesn’t apply, and
>> they are rectangular. Seamless IFRAMEs take care of two of those problems,
>> but they seem to have been abandoned as a proposal for standardization.
>>
>> Q: If you have some markup with a Safe Node in it, is that safe? A: Best
>> practice: Always output unsafe markup into a Safe Node that you (the host)
>> have created. If you do need to manipulate markup containing a Safe Node and
>> then output that markup directly onto the page, remember to treat the Safe
>> Node string as an atomic unit. Untrusted markup injected into the Safe Node
>> markup could prematurely close the Safe Node.
>>
>> Q: Can you manipulate the DOM underneath a Safe Node? A: Sure! If a SCRIPT
>> node is created or moved to within the Safe Node, for example, it simply
>> does not execute script. Of course it would not be secure to pull nodes out
>> from within a safe tree and move them elsewhere in the DOM, outside of a
>> safe node.
>>
>> Q: Why not implement a "safe innerHTML" instead of a Safe Node? A: The
>> Safe Node paradigm makes it easy to store configuration parameters in an
>> attribute on the element. Also, apparently
>> .innerHTML is not available in SVG.
>>
>> Todo
>>
>> Is there a convenient and safe way to enable script from outside the Safe
>> Node to set an event handler on markup existing within the safe node? It
>> should be possible to do this safely in some fashion. We would certainly
>> need to consider the possibility of DOM clobbering though and ensure best
>> practice is immune to that.
>>
>> Thoughts? The key question: Is this proposal better or worse than more
>> traditional client-side sanitization baked into a browser API?
>>
>> References
>>
>> Michael Zalewski has previously proposed a very similar idea.
>>
>> The list of policies for enforcement was inspired by Michael’s work on and
>> also by my own work on the jSanity client-side sanitizer.
>>
>> Dave
>
>
Received on Thursday, 21 January 2016 23:23:30 UTC