Re: In-browser sanitization vs. a “Safe Node” in the DOM from David Ross on 2016-01-23 (public-webappsec@w3.org from January 2016)

From: David Ross <drx@google.com>
Date: Sat, 23 Jan 2016 00:27:54 -0800
To: Jim Manico <jim.manico@owasp.org>
Cc: Michal Zalewski <lcamtuf@coredump.cx>, Chris Palmer <palmer@google.com>, Crispin Cowan <crispin@microsoft.com>, Craig Francis <craig.francis@gmail.com>, Conrad Irwin <conrad.irwin@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAMM+ux6b4aqLZMB4PKZrLmKA-gmZuOXBh9wcnDm821ykwBkJDw@mail.gmail.com>
Ok, I see that you're saying there would be less maintenance because
the big list of hundreds of known-good tags, attributes, and CSS would
be punted from sanitizer defaults to instead being specified in
configuration.  But this doesn't mean that there is no list to manage
_somewhere_.  It just changes the party responsible for managing the
list.  I'd argue that the sanitizer itself is in the best position to
get this right, which is why jSanity maintains a list of known-good
tags, attributes, and CSS properties.

I also believe you're saying that this provides more strict validation
because the consumer of the sanitizer would supply just a short list
of tags to whitelist.  I think in practice sanitizer consumers very
often require a baseline configuration that allows a broad set of
tags, attributes, and CSS properties that are incontrovertibly safe.
That can be a big list, and again it's something that the sanitizer
would in the best position to manage properly.

Can you imagine stumbling across one of these sanitizer configurations
in a pentest?  Every configuration would be different, and surely some
would have gotten bloated with various tags and attributes over time.
What a goldmine for bugs!

I agree that if users required a basic sanitizer that only let a few
things through, you could take this approach and avoid hardcoding big
lists that require maintenance.  But then I think that type of
sanitizer would only have the relative advantages of low maintenance
and strict validation in the use cases that don't require robust
markup.  In other cases it would tend to create more of a problem than
it would solve.  I also don't see that it would be advantageous to
build this type of sanitizer into the browser -- a tiny javascript
library should work fine.

Dave


On Fri, Jan 22, 2016 at 5:14 PM, Jim Manico <jim.manico@owasp.org> wrote:
>> Can you get a little more specific about what you're suggesting?
>
> Something along the lines of....
>
> sanitize(rawHTML, policy);
>
> Which would be called like the following but with a better policy mechanism.
>
> coolwidget.innerHTML= sanitize(rawHTML,  "<b>, <i>, <a>");
>
> "One of your own" created an HTML Sanitizer that has a much more fully
> featured policy rule mechanism that you can check out here.
> https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project#tab=Creating_a_HTML_Policy
>
> My conjecture is that once this is working properly (which is rough) it will
> require a lot less maintenance when new markup features are added.
>
> But make no mistake, one of the reasons I support doing both is because the
> simplicity of what you are doing for developers is compelling. But I think
> stricter validation like I am suggesting is valuable as well.
>
> Aloha,
> Jim
>
>
> On 1/22/16 8:03 PM, David Ross wrote:
>>>
>>> Is my concern that your policy-sandbox would need constant
>>> updating as new browser features were added a fair concern?
>>
>> Any sanitizer needs some ongoing level maintenance already today.  A
>> lot of that is just to add support for (whitelist) new browser
>> features, and then to backtrack a bit if that turns out not to have
>> been such a good idea.  =)  When you've got a sanitizer written in C++
>> and baked into a browser, updating that sanitizer in this way might be
>> even more burdensome.
>>
>> In the case of Safe Node, we would _not_ generally make one-off
>> changes to tweak the code to add or remove support for new elements,
>> attributes, etc.  Adding any new feature, the question would be this:
>> Walking down the list of Safe Node enforced policies, would the new
>> featue subvert any of them?  If so _and_ the new feature doesn't
>>
>> leverage existing building blocks that are already regulated by
>> policy, _then_ there needs to be additional policy enforcement put in
>> place.  So I think that an implementation of Safe Node would require
>> less ongoing maintenance than a sanitizer baked into the browser.
>>
>>> Do you think supporting some kind of HTML policy engine like
>>> I'm suggesting is valid at all?
>>
>> Can you get a little more specific about what you're suggesting?
>>
>> Dave
>>
>> On Fri, Jan 22, 2016 at 4:43 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>>>
>>>> and certainly it's no more blacklist-based than a sanitizer
>>>
>>> Hmmm. My thinking was "Davids proposal is going to disable certain
>>> features.
>>> HTML sanitizers only try to enforce good tags without needing any
>>> knowledge
>>> of the bad stuff". That is why I think of your work as "blacklist" and
>>> HTML
>>> sanitizers as "whitelist".
>>>
>>> Anyhow, it sure was an Edge-ie case! Thank you for catching my lame pun.
>>> I
>>> know this is going to hurt you to hear it, but IE and Edge matter. I'm
>>> glad
>>> to know your proposal would have caught this.
>>>
>>> Is my concern that your policy-sandbox would need constant updating as
>>> new
>>> browser features were added a fair concern?
>>>
>>> Do you think supporting some kind of HTML policy engine like I'm
>>> suggesting
>>> is valid at all?
>>>
>>> Aloha,
>>> Jim
>>>
>>>
>>>
>>>
>>> On 1/22/16 6:35 PM, David Ross wrote:
>>>
>>> I would not characterize it as blacklist-based, and certainly it's no
>>> more
>>> blacklist-based than a sanitizer.
>>>
>>>> What about CSS expressions and other edge cases not
>>>> described in http://lcamtuf.coredump.cx/postxss/ ?
>>>
>>> It's covered by this policy:
>>> * Disablement of script / active content
>>>
>>> Also, was that a pun?  Because CSS expressions are an Edge case.  =)
>>>
>>>
>>> On Fri, Jan 22, 2016 at 3:28 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>>>
>>>> Again, I am reading your proposal right now, but this looks a little
>>>> blacklist-ish to me. What about CSS expressions and other edge cases not
>>>> described in http://lcamtuf.coredump.cx/postxss/ ? There more out there
>>>> per
>>>> my understanding....
>>>>
>>>> This is why I prefer more programatic sanitization is because it's a
>>>> whitelist which tends to be a stronger control. Once a good sanitization
>>>> API
>>>> is built, it will stand the test of time as new browser features are
>>>> added.
>>>>
>>>> An approach just banning bad things will be way more fragile as new
>>>> browser features get added over time.
>>>>
>>>> - Jim
>>>>
>>>>
>>>> On 1/22/16 6:17 PM, David Ross wrote:
>>>>>>
>>>>>> There is a handful of examples where the rigidity basically
>>>>>> ruled out adoption (e.g., MSIE's old <iframe> sandbox).
>>>>>
>>>>> This: https://msdn.microsoft.com/en-us/library/ms534622(v=vs.85).aspx
>>>>> It came in for Hotmail, but it was never put to use AFAIK, exactly for
>>>>> the reason you describe.
>>>>>
>>>>> There is a finite list of "unsafe" things that markup / CSS can do
>>>>> when rendered on a page.  (Essential reference, of course:
>>>>> http://lcamtuf.coredump.cx/postxss/)  It is possible there are a
>>>>> couple things missing from the initial list of Safe Node policies
>>>>> requiring enforcement.  (E.g.: Link targeting is covered but we
>>>>> probably also need a way to regulate navigation more generally.)  But
>>>>> the problem is tractable.  And I don't think that sanitization baked
>>>>> into the browser provides a better approach in this regard.
>>>>>
>>>>> Another key thing here is that with either a sanitizer or Safe Node,
>>>>> it's important to pick a good set of secure defaults.  That way the
>>>>> policy problems Michal described are less likely to occur as custom
>>>>> configuration tends to be minimal.  With the sandbox attribute for
>>>>> frames, I think the use cases vary to such an extent that it would
>>>>> have been hard to set secure defaults.  E.g.: allow-scripts and
>>>>> allow-same-origin are OK independently, but not when combined.
>>>>> There's no safe default there because there are many use cases for
>>>>> either approach.  I don't see that Safe Node policies interfere with
>>>>> each other in this way and so we probably dodged this bullet.
>>>>>
>>>>> Jim said:
>>>>>>
>>>>>> I have an aversion to different policy packages not being
>>>>>> flexible enough to be useful.
>>>>>
>>>>> FWIW, as per earlier in the thread, the Safe Node approach addresses
>>>>> scenarios around CSS where _sanitization_ is inflexible.  (Caveat: If
>>>>> a sanitizer is baked into the browser, all of a sudden it can pursue
>>>>> the same approach.)
>>>>>
>>>>>> Perhaps support both of these approaches? HTML
>>>>>> Programmatic sanitization and several pre-built policies?
>>>>>> That would provide both easy of use for some, and deep
>>>>>> flexibility for others. Win win win, and win?
>>>>>
>>>>> My argument is that Safe Node has advantages relative to sanitization
>>>>> baked into the browser.  If you can identify a legit use case that
>>>>> Safe Node can't support cleanly, but browser-based sanitization does,
>>>>> I'd probably jump right back on the sanitization bandwagon.  I wrote a
>>>>> client-side sanitizer not that long ago and I enjoy working on them.
>>>>> =)
>>>>>
>>>>> Dave
>>>>>
>>>>> On Fri, Jan 22, 2016 at 2:40 PM, Jim Manico <jim.manico@owasp.org>
>>>>> wrote:
>>>>>>
>>>>>> Thank you Michal. I'll give David's proposal a closer read and comment
>>>>>> shortly.
>>>>>>
>>>>>> I remember Microsoft and their AntiXSS library providing an HTML
>>>>>> Sanitizer
>>>>>> API for untrusted HTML input. It was one of the first in any major
>>>>>> language
>>>>>> or framework. The first version was very permissive and useful but
>>>>>> unfortunately was vulnerable to HTML hacking and of course XSS. The
>>>>>> latest
>>>>>> incarnation was fixed to be very secure, but unfortunately was not at
>>>>>> all
>>>>>> useful because it was so restrictive. And MS is now deprecating it
>>>>>> with
>>>>>> no
>>>>>> commitment to maintain it.
>>>>>>
>>>>>> I have an aversion to different policy packages not being flexible
>>>>>> enough to
>>>>>> be useful. But I will give David's proposal a deeper read and provide
>>>>>> comments more specific to his proposal.
>>>>>>
>>>>>> Perhaps support both of these approaches? HTML Programmatic
>>>>>> sanitization
>>>>>> and
>>>>>> several pre-built policies? That would provide both easy of use for
>>>>>> some,
>>>>>> and deep flexibility for others. Win win win, and win?
>>>>>>
>>>>>> Aloha,
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/22/16 5:29 PM, Michal Zalewski wrote:
>>>>>>>>
>>>>>>>> The need to inject untrusted markup into the DOM comes up all the
>>>>>>>> time
>>>>>>>> and
>>>>>>>> is critical (WYSIWYG editors ,etc). But any "safe node" that limits
>>>>>>>> what
>>>>>>>> can
>>>>>>>> render and execute will limit innovation. Each developer needs to
>>>>>>>> support
>>>>>>>> a
>>>>>>>> different markup subset for their app, which is why policy based
>>>>>>>> sanitization is so critical to this use case.
>>>>>>>>
>>>>>>>> Take a look at CAJA JS's sanitizer, Angulars $sanitize,  and other
>>>>>>>> JS
>>>>>>>> centric HTML sanitizers. They all allow the developer to set a
>>>>>>>> policy
>>>>>>>> of
>>>>>>>> what tags and attributes should be supported, and all other markup
>>>>>>>> gets
>>>>>>>> stripped out.
>>>>>>>>
>>>>>>>> This is the kind of native defensive pattern we need in JavaScript,
>>>>>>>> IMO!
>>>>>>>
>>>>>>> I think there are interesting trade-offs, and I wouldn't be too quick
>>>>>>> to praise one approach over the other. If you design use-centric
>>>>>>> "policy packages" (akin to what's captured in David's proposal), you
>>>>>>> offer safe and consistent choices to developers. The big unknown is
>>>>>>> whether the policies will be sufficiently flexible and future-proof -
>>>>>>> for example, will there be some next-gen communication app that
>>>>>>> requires a paradigm completely different from discussion forums or
>>>>>>> e-mail?
>>>>>>>
>>>>>>> There is a handful of examples where the rigidity basically ruled out
>>>>>>> adoption (e.g., MSIE's old <iframe> sandbox).
>>>>>>>
>>>>>>> The other alternative is the Lego-style policy building approach
>>>>>>> taken
>>>>>>> with CSP. Out of the countless number of CSP policies you can create,
>>>>>>> most will have inconsistent or self-defeating security properties,
>>>>>>> and
>>>>>>> building watertight ones requires a fair amount of expertise. Indeed,
>>>>>>> most CSP deployments we see today probably don't provide much in term
>>>>>>> of security. But CSP is certainly a lot more flexible and
>>>>>>> future-proof
>>>>>>> than the prepackaged approach.
>>>>>>>
>>>>>>> At the same time treating flexibility as a goal in itself can lead to
>>>>>>> absurd outcomes, too: a logical conclusion is to just provide
>>>>>>> programmatic hooks for flexible, dynamic filtering of markup, instead
>>>>>>> of any static, declarative policies. One frequently-cited approach
>>>>>>> here was Microsoft's Mutation-Event Transforms [1], and I don't think
>>>>>>> it was a step in the right direction (perhaps except as a finicky
>>>>>>> building block for more developer-friendly sanitizers).
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> http://research.microsoft.com/en-us/um/people/livshits/papers/pdf/hotos07.pdf
>>>>>>
>>>>>>
>>>
>
Received on Saturday, 23 January 2016 08:28:45 UTC