Re: In-browser sanitization vs. a “Safe Node” in the DOM from Jim Manico on 2016-01-23 (public-webappsec@w3.org from January 2016)

From: Jim Manico <jim.manico@owasp.org>
Date: Fri, 22 Jan 2016 20:14:52 -0500
To: David Ross <drx@google.com>
Cc: Michal Zalewski <lcamtuf@coredump.cx>, Chris Palmer <palmer@google.com>, Crispin Cowan <crispin@microsoft.com>, Craig Francis <craig.francis@gmail.com>, Conrad Irwin <conrad.irwin@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <56A2D40C.6000008@owasp.org>
 > Can you get a little more specific about what you're suggesting?

Something along the lines of....

sanitize(rawHTML, policy);

Which would be called like the following but with a better policy mechanism.

coolwidget.innerHTML= sanitize(rawHTML,  "<b>, <i>, <a>");

"One of your own" created an HTML Sanitizer that has a much more fully 
featured policy rule mechanism that you can check out here. 
https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project#tab=Creating_a_HTML_Policy

My conjecture is that once this is working properly (which is rough) it 
will require a lot less maintenance when new markup features are added.

But make no mistake, one of the reasons I support doing both is because 
the simplicity of what you are doing for developers is compelling. But I 
think stricter validation like I am suggesting is valuable as well.

Aloha,
Jim


On 1/22/16 8:03 PM, David Ross wrote:
>> Is my concern that your policy-sandbox would need constant
>> updating as new browser features were added a fair concern?
> Any sanitizer needs some ongoing level maintenance already today.  A
> lot of that is just to add support for (whitelist) new browser
> features, and then to backtrack a bit if that turns out not to have
> been such a good idea.  =)  When you've got a sanitizer written in C++
> and baked into a browser, updating that sanitizer in this way might be
> even more burdensome.
>
> In the case of Safe Node, we would _not_ generally make one-off
> changes to tweak the code to add or remove support for new elements,
> attributes, etc.  Adding any new feature, the question would be this:
> Walking down the list of Safe Node enforced policies, would the new
> featue subvert any of them?  If so _and_ the new feature doesn't
> leverage existing building blocks that are already regulated by
> policy, _then_ there needs to be additional policy enforcement put in
> place.  So I think that an implementation of Safe Node would require
> less ongoing maintenance than a sanitizer baked into the browser.
>
>> Do you think supporting some kind of HTML policy engine like
>> I'm suggesting is valid at all?
> Can you get a little more specific about what you're suggesting?
>
> Dave
>
> On Fri, Jan 22, 2016 at 4:43 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>> and certainly it's no more blacklist-based than a sanitizer
>> Hmmm. My thinking was "Davids proposal is going to disable certain features.
>> HTML sanitizers only try to enforce good tags without needing any knowledge
>> of the bad stuff". That is why I think of your work as "blacklist" and HTML
>> sanitizers as "whitelist".
>>
>> Anyhow, it sure was an Edge-ie case! Thank you for catching my lame pun. I
>> know this is going to hurt you to hear it, but IE and Edge matter. I'm glad
>> to know your proposal would have caught this.
>>
>> Is my concern that your policy-sandbox would need constant updating as new
>> browser features were added a fair concern?
>>
>> Do you think supporting some kind of HTML policy engine like I'm suggesting
>> is valid at all?
>>
>> Aloha,
>> Jim
>>
>>
>>
>>
>> On 1/22/16 6:35 PM, David Ross wrote:
>>
>> I would not characterize it as blacklist-based, and certainly it's no more
>> blacklist-based than a sanitizer.
>>
>>> What about CSS expressions and other edge cases not
>>> described in http://lcamtuf.coredump.cx/postxss/ ?
>> It's covered by this policy:
>> * Disablement of script / active content
>>
>> Also, was that a pun?  Because CSS expressions are an Edge case.  =)
>>
>>
>> On Fri, Jan 22, 2016 at 3:28 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>> Again, I am reading your proposal right now, but this looks a little
>>> blacklist-ish to me. What about CSS expressions and other edge cases not
>>> described in http://lcamtuf.coredump.cx/postxss/ ? There more out there per
>>> my understanding....
>>>
>>> This is why I prefer more programatic sanitization is because it's a
>>> whitelist which tends to be a stronger control. Once a good sanitization API
>>> is built, it will stand the test of time as new browser features are added.
>>>
>>> An approach just banning bad things will be way more fragile as new
>>> browser features get added over time.
>>>
>>> - Jim
>>>
>>>
>>> On 1/22/16 6:17 PM, David Ross wrote:
>>>>> There is a handful of examples where the rigidity basically
>>>>> ruled out adoption (e.g., MSIE's old <iframe> sandbox).
>>>> This: https://msdn.microsoft.com/en-us/library/ms534622(v=vs.85).aspx
>>>> It came in for Hotmail, but it was never put to use AFAIK, exactly for
>>>> the reason you describe.
>>>>
>>>> There is a finite list of "unsafe" things that markup / CSS can do
>>>> when rendered on a page.  (Essential reference, of course:
>>>> http://lcamtuf.coredump.cx/postxss/)  It is possible there are a
>>>> couple things missing from the initial list of Safe Node policies
>>>> requiring enforcement.  (E.g.: Link targeting is covered but we
>>>> probably also need a way to regulate navigation more generally.)  But
>>>> the problem is tractable.  And I don't think that sanitization baked
>>>> into the browser provides a better approach in this regard.
>>>>
>>>> Another key thing here is that with either a sanitizer or Safe Node,
>>>> it's important to pick a good set of secure defaults.  That way the
>>>> policy problems Michal described are less likely to occur as custom
>>>> configuration tends to be minimal.  With the sandbox attribute for
>>>> frames, I think the use cases vary to such an extent that it would
>>>> have been hard to set secure defaults.  E.g.: allow-scripts and
>>>> allow-same-origin are OK independently, but not when combined.
>>>> There's no safe default there because there are many use cases for
>>>> either approach.  I don't see that Safe Node policies interfere with
>>>> each other in this way and so we probably dodged this bullet.
>>>>
>>>> Jim said:
>>>>> I have an aversion to different policy packages not being
>>>>> flexible enough to be useful.
>>>> FWIW, as per earlier in the thread, the Safe Node approach addresses
>>>> scenarios around CSS where _sanitization_ is inflexible.  (Caveat: If
>>>> a sanitizer is baked into the browser, all of a sudden it can pursue
>>>> the same approach.)
>>>>
>>>>> Perhaps support both of these approaches? HTML
>>>>> Programmatic sanitization and several pre-built policies?
>>>>> That would provide both easy of use for some, and deep
>>>>> flexibility for others. Win win win, and win?
>>>> My argument is that Safe Node has advantages relative to sanitization
>>>> baked into the browser.  If you can identify a legit use case that
>>>> Safe Node can't support cleanly, but browser-based sanitization does,
>>>> I'd probably jump right back on the sanitization bandwagon.  I wrote a
>>>> client-side sanitizer not that long ago and I enjoy working on them.
>>>> =)
>>>>
>>>> Dave
>>>>
>>>> On Fri, Jan 22, 2016 at 2:40 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>>>> Thank you Michal. I'll give David's proposal a closer read and comment
>>>>> shortly.
>>>>>
>>>>> I remember Microsoft and their AntiXSS library providing an HTML
>>>>> Sanitizer
>>>>> API for untrusted HTML input. It was one of the first in any major
>>>>> language
>>>>> or framework. The first version was very permissive and useful but
>>>>> unfortunately was vulnerable to HTML hacking and of course XSS. The
>>>>> latest
>>>>> incarnation was fixed to be very secure, but unfortunately was not at
>>>>> all
>>>>> useful because it was so restrictive. And MS is now deprecating it with
>>>>> no
>>>>> commitment to maintain it.
>>>>>
>>>>> I have an aversion to different policy packages not being flexible
>>>>> enough to
>>>>> be useful. But I will give David's proposal a deeper read and provide
>>>>> comments more specific to his proposal.
>>>>>
>>>>> Perhaps support both of these approaches? HTML Programmatic sanitization
>>>>> and
>>>>> several pre-built policies? That would provide both easy of use for
>>>>> some,
>>>>> and deep flexibility for others. Win win win, and win?
>>>>>
>>>>> Aloha,
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>> On 1/22/16 5:29 PM, Michal Zalewski wrote:
>>>>>>> The need to inject untrusted markup into the DOM comes up all the time
>>>>>>> and
>>>>>>> is critical (WYSIWYG editors ,etc). But any "safe node" that limits
>>>>>>> what
>>>>>>> can
>>>>>>> render and execute will limit innovation. Each developer needs to
>>>>>>> support
>>>>>>> a
>>>>>>> different markup subset for their app, which is why policy based
>>>>>>> sanitization is so critical to this use case.
>>>>>>>
>>>>>>> Take a look at CAJA JS's sanitizer, Angulars $sanitize,  and other JS
>>>>>>> centric HTML sanitizers. They all allow the developer to set a policy
>>>>>>> of
>>>>>>> what tags and attributes should be supported, and all other markup
>>>>>>> gets
>>>>>>> stripped out.
>>>>>>>
>>>>>>> This is the kind of native defensive pattern we need in JavaScript,
>>>>>>> IMO!
>>>>>> I think there are interesting trade-offs, and I wouldn't be too quick
>>>>>> to praise one approach over the other. If you design use-centric
>>>>>> "policy packages" (akin to what's captured in David's proposal), you
>>>>>> offer safe and consistent choices to developers. The big unknown is
>>>>>> whether the policies will be sufficiently flexible and future-proof -
>>>>>> for example, will there be some next-gen communication app that
>>>>>> requires a paradigm completely different from discussion forums or
>>>>>> e-mail?
>>>>>>
>>>>>> There is a handful of examples where the rigidity basically ruled out
>>>>>> adoption (e.g., MSIE's old <iframe> sandbox).
>>>>>>
>>>>>> The other alternative is the Lego-style policy building approach taken
>>>>>> with CSP. Out of the countless number of CSP policies you can create,
>>>>>> most will have inconsistent or self-defeating security properties, and
>>>>>> building watertight ones requires a fair amount of expertise. Indeed,
>>>>>> most CSP deployments we see today probably don't provide much in term
>>>>>> of security. But CSP is certainly a lot more flexible and future-proof
>>>>>> than the prepackaged approach.
>>>>>>
>>>>>> At the same time treating flexibility as a goal in itself can lead to
>>>>>> absurd outcomes, too: a logical conclusion is to just provide
>>>>>> programmatic hooks for flexible, dynamic filtering of markup, instead
>>>>>> of any static, declarative policies. One frequently-cited approach
>>>>>> here was Microsoft's Mutation-Event Transforms [1], and I don't think
>>>>>> it was a step in the right direction (perhaps except as a finicky
>>>>>> building block for more developer-friendly sanitizers).
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>> http://research.microsoft.com/en-us/um/people/livshits/papers/pdf/hotos07.pdf
>>>>>
>>
Received on Saturday, 23 January 2016 01:15:24 UTC