W3C home > Mailing lists > Public > public-webappsec@w3.org > January 2016

Re: In-browser sanitization vs. a “Safe Node” in the DOM

From: David Ross <drx@google.com>
Date: Sat, 23 Jan 2016 00:43:42 -0800
Message-ID: <CAMM+ux5ibJr1=fhLhGHsxjX0qtw8Wy6q-WPuP8PzxsfXy87oDg@mail.gmail.com>
To: Jim Manico <jim.manico@owasp.org>
Cc: Michal Zalewski <lcamtuf@coredump.cx>, Chris Palmer <palmer@google.com>, Crispin Cowan <crispin@microsoft.com>, Craig Francis <craig.francis@gmail.com>, Conrad Irwin <conrad.irwin@gmail.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>
Ack, I have to admit my first sentence kind of mis-characterizes Jim's
position a bit.  Just to clarify: I understand that Jim isn't
advocating for the use of a big list of tags, etc. in the sanitizer or
its configuration.  Sorry about that!

Dave

On Sat, Jan 23, 2016 at 12:27 AM, David Ross <drx@google.com> wrote:
> Ok, I see that you're saying there would be less maintenance because
> the big list of hundreds of known-good tags, attributes, and CSS would
> be punted from sanitizer defaults to instead being specified in
> configuration.  But this doesn't mean that there is no list to manage
> _somewhere_.  It just changes the party responsible for managing the
> list.  I'd argue that the sanitizer itself is in the best position to
> get this right, which is why jSanity maintains a list of known-good
> tags, attributes, and CSS properties.
>
> I also believe you're saying that this provides more strict validation
> because the consumer of the sanitizer would supply just a short list
> of tags to whitelist.  I think in practice sanitizer consumers very
> often require a baseline configuration that allows a broad set of
> tags, attributes, and CSS properties that are incontrovertibly safe.
> That can be a big list, and again it's something that the sanitizer
> would in the best position to manage properly.
>
> Can you imagine stumbling across one of these sanitizer configurations
> in a pentest?  Every configuration would be different, and surely some
> would have gotten bloated with various tags and attributes over time.
> What a goldmine for bugs!
>
> I agree that if users required a basic sanitizer that only let a few
> things through, you could take this approach and avoid hardcoding big
> lists that require maintenance.  But then I think that type of
> sanitizer would only have the relative advantages of low maintenance
> and strict validation in the use cases that don't require robust
> markup.  In other cases it would tend to create more of a problem than
> it would solve.  I also don't see that it would be advantageous to
> build this type of sanitizer into the browser -- a tiny javascript
> library should work fine.
>
> Dave
>
>
> On Fri, Jan 22, 2016 at 5:14 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>> Can you get a little more specific about what you're suggesting?
>>
>> Something along the lines of....
>>
>> sanitize(rawHTML, policy);
>>
>> Which would be called like the following but with a better policy mechanism.
>>
>> coolwidget.innerHTML= sanitize(rawHTML,  "<b>, <i>, <a>");
>>
>> "One of your own" created an HTML Sanitizer that has a much more fully
>> featured policy rule mechanism that you can check out here.
>> https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project#tab=Creating_a_HTML_Policy
>>
>> My conjecture is that once this is working properly (which is rough) it will
>> require a lot less maintenance when new markup features are added.
>>
>> But make no mistake, one of the reasons I support doing both is because the
>> simplicity of what you are doing for developers is compelling. But I think
>> stricter validation like I am suggesting is valuable as well.
>>
>> Aloha,
>> Jim
>>
>>
>> On 1/22/16 8:03 PM, David Ross wrote:
>>>>
>>>> Is my concern that your policy-sandbox would need constant
>>>> updating as new browser features were added a fair concern?
>>>
>>> Any sanitizer needs some ongoing level maintenance already today.  A
>>> lot of that is just to add support for (whitelist) new browser
>>> features, and then to backtrack a bit if that turns out not to have
>>> been such a good idea.  =)  When you've got a sanitizer written in C++
>>> and baked into a browser, updating that sanitizer in this way might be
>>> even more burdensome.
>>>
>>> In the case of Safe Node, we would _not_ generally make one-off
>>> changes to tweak the code to add or remove support for new elements,
>>> attributes, etc.  Adding any new feature, the question would be this:
>>> Walking down the list of Safe Node enforced policies, would the new
>>> featue subvert any of them?  If so _and_ the new feature doesn't
>>>
>>> leverage existing building blocks that are already regulated by
>>> policy, _then_ there needs to be additional policy enforcement put in
>>> place.  So I think that an implementation of Safe Node would require
>>> less ongoing maintenance than a sanitizer baked into the browser.
>>>
>>>> Do you think supporting some kind of HTML policy engine like
>>>> I'm suggesting is valid at all?
>>>
>>> Can you get a little more specific about what you're suggesting?
>>>
>>> Dave
>>>
>>> On Fri, Jan 22, 2016 at 4:43 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>>>>
>>>>> and certainly it's no more blacklist-based than a sanitizer
>>>>
>>>> Hmmm. My thinking was "Davids proposal is going to disable certain
>>>> features.
>>>> HTML sanitizers only try to enforce good tags without needing any
>>>> knowledge
>>>> of the bad stuff". That is why I think of your work as "blacklist" and
>>>> HTML
>>>> sanitizers as "whitelist".
>>>>
>>>> Anyhow, it sure was an Edge-ie case! Thank you for catching my lame pun.
>>>> I
>>>> know this is going to hurt you to hear it, but IE and Edge matter. I'm
>>>> glad
>>>> to know your proposal would have caught this.
>>>>
>>>> Is my concern that your policy-sandbox would need constant updating as
>>>> new
>>>> browser features were added a fair concern?
>>>>
>>>> Do you think supporting some kind of HTML policy engine like I'm
>>>> suggesting
>>>> is valid at all?
>>>>
>>>> Aloha,
>>>> Jim
>>>>
>>>>
>>>>
>>>>
>>>> On 1/22/16 6:35 PM, David Ross wrote:
>>>>
>>>> I would not characterize it as blacklist-based, and certainly it's no
>>>> more
>>>> blacklist-based than a sanitizer.
>>>>
>>>>> What about CSS expressions and other edge cases not
>>>>> described in http://lcamtuf.coredump.cx/postxss/ ?
>>>>
>>>> It's covered by this policy:
>>>> * Disablement of script / active content
>>>>
>>>> Also, was that a pun?  Because CSS expressions are an Edge case.  =)
>>>>
>>>>
>>>> On Fri, Jan 22, 2016 at 3:28 PM, Jim Manico <jim.manico@owasp.org> wrote:
>>>>>
>>>>> Again, I am reading your proposal right now, but this looks a little
>>>>> blacklist-ish to me. What about CSS expressions and other edge cases not
>>>>> described in http://lcamtuf.coredump.cx/postxss/ ? There more out there
>>>>> per
>>>>> my understanding....
>>>>>
>>>>> This is why I prefer more programatic sanitization is because it's a
>>>>> whitelist which tends to be a stronger control. Once a good sanitization
>>>>> API
>>>>> is built, it will stand the test of time as new browser features are
>>>>> added.
>>>>>
>>>>> An approach just banning bad things will be way more fragile as new
>>>>> browser features get added over time.
>>>>>
>>>>> - Jim
>>>>>
>>>>>
>>>>> On 1/22/16 6:17 PM, David Ross wrote:
>>>>>>>
>>>>>>> There is a handful of examples where the rigidity basically
>>>>>>> ruled out adoption (e.g., MSIE's old <iframe> sandbox).
>>>>>>
>>>>>> This: https://msdn.microsoft.com/en-us/library/ms534622(v=vs.85).aspx
>>>>>> It came in for Hotmail, but it was never put to use AFAIK, exactly for
>>>>>> the reason you describe.
>>>>>>
>>>>>> There is a finite list of "unsafe" things that markup / CSS can do
>>>>>> when rendered on a page.  (Essential reference, of course:
>>>>>> http://lcamtuf.coredump.cx/postxss/)  It is possible there are a
>>>>>> couple things missing from the initial list of Safe Node policies
>>>>>> requiring enforcement.  (E.g.: Link targeting is covered but we
>>>>>> probably also need a way to regulate navigation more generally.)  But
>>>>>> the problem is tractable.  And I don't think that sanitization baked
>>>>>> into the browser provides a better approach in this regard.
>>>>>>
>>>>>> Another key thing here is that with either a sanitizer or Safe Node,
>>>>>> it's important to pick a good set of secure defaults.  That way the
>>>>>> policy problems Michal described are less likely to occur as custom
>>>>>> configuration tends to be minimal.  With the sandbox attribute for
>>>>>> frames, I think the use cases vary to such an extent that it would
>>>>>> have been hard to set secure defaults.  E.g.: allow-scripts and
>>>>>> allow-same-origin are OK independently, but not when combined.
>>>>>> There's no safe default there because there are many use cases for
>>>>>> either approach.  I don't see that Safe Node policies interfere with
>>>>>> each other in this way and so we probably dodged this bullet.
>>>>>>
>>>>>> Jim said:
>>>>>>>
>>>>>>> I have an aversion to different policy packages not being
>>>>>>> flexible enough to be useful.
>>>>>>
>>>>>> FWIW, as per earlier in the thread, the Safe Node approach addresses
>>>>>> scenarios around CSS where _sanitization_ is inflexible.  (Caveat: If
>>>>>> a sanitizer is baked into the browser, all of a sudden it can pursue
>>>>>> the same approach.)
>>>>>>
>>>>>>> Perhaps support both of these approaches? HTML
>>>>>>> Programmatic sanitization and several pre-built policies?
>>>>>>> That would provide both easy of use for some, and deep
>>>>>>> flexibility for others. Win win win, and win?
>>>>>>
>>>>>> My argument is that Safe Node has advantages relative to sanitization
>>>>>> baked into the browser.  If you can identify a legit use case that
>>>>>> Safe Node can't support cleanly, but browser-based sanitization does,
>>>>>> I'd probably jump right back on the sanitization bandwagon.  I wrote a
>>>>>> client-side sanitizer not that long ago and I enjoy working on them.
>>>>>> =)
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>> On Fri, Jan 22, 2016 at 2:40 PM, Jim Manico <jim.manico@owasp.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> Thank you Michal. I'll give David's proposal a closer read and comment
>>>>>>> shortly.
>>>>>>>
>>>>>>> I remember Microsoft and their AntiXSS library providing an HTML
>>>>>>> Sanitizer
>>>>>>> API for untrusted HTML input. It was one of the first in any major
>>>>>>> language
>>>>>>> or framework. The first version was very permissive and useful but
>>>>>>> unfortunately was vulnerable to HTML hacking and of course XSS. The
>>>>>>> latest
>>>>>>> incarnation was fixed to be very secure, but unfortunately was not at
>>>>>>> all
>>>>>>> useful because it was so restrictive. And MS is now deprecating it
>>>>>>> with
>>>>>>> no
>>>>>>> commitment to maintain it.
>>>>>>>
>>>>>>> I have an aversion to different policy packages not being flexible
>>>>>>> enough to
>>>>>>> be useful. But I will give David's proposal a deeper read and provide
>>>>>>> comments more specific to his proposal.
>>>>>>>
>>>>>>> Perhaps support both of these approaches? HTML Programmatic
>>>>>>> sanitization
>>>>>>> and
>>>>>>> several pre-built policies? That would provide both easy of use for
>>>>>>> some,
>>>>>>> and deep flexibility for others. Win win win, and win?
>>>>>>>
>>>>>>> Aloha,
>>>>>>> Jim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 1/22/16 5:29 PM, Michal Zalewski wrote:
>>>>>>>>>
>>>>>>>>> The need to inject untrusted markup into the DOM comes up all the
>>>>>>>>> time
>>>>>>>>> and
>>>>>>>>> is critical (WYSIWYG editors ,etc). But any "safe node" that limits
>>>>>>>>> what
>>>>>>>>> can
>>>>>>>>> render and execute will limit innovation. Each developer needs to
>>>>>>>>> support
>>>>>>>>> a
>>>>>>>>> different markup subset for their app, which is why policy based
>>>>>>>>> sanitization is so critical to this use case.
>>>>>>>>>
>>>>>>>>> Take a look at CAJA JS's sanitizer, Angulars $sanitize,  and other
>>>>>>>>> JS
>>>>>>>>> centric HTML sanitizers. They all allow the developer to set a
>>>>>>>>> policy
>>>>>>>>> of
>>>>>>>>> what tags and attributes should be supported, and all other markup
>>>>>>>>> gets
>>>>>>>>> stripped out.
>>>>>>>>>
>>>>>>>>> This is the kind of native defensive pattern we need in JavaScript,
>>>>>>>>> IMO!
>>>>>>>>
>>>>>>>> I think there are interesting trade-offs, and I wouldn't be too quick
>>>>>>>> to praise one approach over the other. If you design use-centric
>>>>>>>> "policy packages" (akin to what's captured in David's proposal), you
>>>>>>>> offer safe and consistent choices to developers. The big unknown is
>>>>>>>> whether the policies will be sufficiently flexible and future-proof -
>>>>>>>> for example, will there be some next-gen communication app that
>>>>>>>> requires a paradigm completely different from discussion forums or
>>>>>>>> e-mail?
>>>>>>>>
>>>>>>>> There is a handful of examples where the rigidity basically ruled out
>>>>>>>> adoption (e.g., MSIE's old <iframe> sandbox).
>>>>>>>>
>>>>>>>> The other alternative is the Lego-style policy building approach
>>>>>>>> taken
>>>>>>>> with CSP. Out of the countless number of CSP policies you can create,
>>>>>>>> most will have inconsistent or self-defeating security properties,
>>>>>>>> and
>>>>>>>> building watertight ones requires a fair amount of expertise. Indeed,
>>>>>>>> most CSP deployments we see today probably don't provide much in term
>>>>>>>> of security. But CSP is certainly a lot more flexible and
>>>>>>>> future-proof
>>>>>>>> than the prepackaged approach.
>>>>>>>>
>>>>>>>> At the same time treating flexibility as a goal in itself can lead to
>>>>>>>> absurd outcomes, too: a logical conclusion is to just provide
>>>>>>>> programmatic hooks for flexible, dynamic filtering of markup, instead
>>>>>>>> of any static, declarative policies. One frequently-cited approach
>>>>>>>> here was Microsoft's Mutation-Event Transforms [1], and I don't think
>>>>>>>> it was a step in the right direction (perhaps except as a finicky
>>>>>>>> building block for more developer-friendly sanitizers).
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>>>>> http://research.microsoft.com/en-us/um/people/livshits/papers/pdf/hotos07.pdf
>>>>>>>
>>>>>>>
>>>>
>>
Received on Saturday, 23 January 2016 08:44:34 UTC

This archive was generated by hypermail 2.3.1 : Monday, 23 October 2017 14:54:17 UTC