[whatwg] Fixing two security vulnerabilities in registerProtocolHandler from Tyler Close on 2012-04-06 (public-whatwg-archive@w3.org from April 2012)

From: Tyler Close <tyler.close@gmail.com>
Date: Fri, 6 Apr 2012 15:01:32 -0700
Message-ID: <CAKvcKK=e2id8xERVZebhNuVsPdTzEnBx01T1B74MhbmXpzmV1w@mail.gmail.com>
On Fri, Apr 6, 2012 at 2:35 PM, Ian Hickson <ian at hixie.ch> wrote:
> On Fri, 6 Apr 2012, Tyler Close wrote:
>> On Mon, Apr 2, 2012 at 4:39 PM, Ian Hickson <ian at hixie.ch> wrote:
>> > On Mon, 26 Sep 2011, Tyler Close wrote:
>> >>
>> >> I was recently experimenting with the registerProtocolHandler (RPH)
>> >> API and came across a couple of security gotchas that make it hard to
>> >> safely use the API. One of these is already known, but AFAICT, hasn't
>> >> been fixed yet. I haven't seen the other discussed yet.
>> >>
>> >> The Mozilla blog post that introduces the registerProtocolHandler API
>> >> makes use of window.parent.postMessage to send a response from the
>> >> RPH handler back to the client page.
>> >
>> > I presume it uses this in conjunction with an <a href=""> link with a
>> > target="" attribute to load the handler in an iframe.
>>
>> The client page loads the handler page using an iframe or a
>> window.open(). Either can work.
>>
>> >> In the example code, the targetOrigin for this postMessage invocation
>> >> is '*', while also noting that this is not secure. AFAICT, there is
>> >> no API that the intent handler can reliably use to determine the
>> >> correct targetOrigin for this postMessage invocation.
>> >
>> > How can the origin be anything other than the origin of the page that
>> > triggered the link?
>>
>> Exactly, but we need a way for the handler page to find out what that
>> origin is.
>>
>> A client page on origin A causes a navigation to a RPH URL (iframe or
>> window.open). The browser loads the user chosen RPH handler, which is
>> another web page from origin B. After the handler page loads, it wants
>> to send a return value back to the client page. How does the handler
>> page know the client page's origin is A? It needs to know this origin
>> string so that it can securely use postMessage to send the return value
>> back. AFAICT, there is no existing API in the browser that lets the
>> handler page determine the client page's origin.
>
> Well if it's an iframe, the parent can't be anything but the original
> origin, as far as I can tell.

What happens if the handler sends the postMessage to "*", then the
parent is navigated? Will the postMessage be delivered or not?

> But in general, there's not expected to be any talking back. If you want
> something where the handler talks back to the page that provided the data,
> then you should use Web Intents. registerProtocolHandler() and
> registerContentHandler() are intended for things like mail clients
> (mailto:) or PDF viewers, which do not talk back. Indeed in the common use
> case, you just click the link and the entire browsing context gets
> replaced, so there's nothing to talk back _to_.

I was prompted to write the original email by a Mozilla blog post that
suggested talking back.

It also seems bad for web APIs to break under simple composition like
this; especially when there's an easy fix available.

>> Currently, the handler page can only specify "*" in the postMessage
>> invocation that sends the return value. If the client page is navigated
>> by an attacker, before the postMessage is done, the attacker can
>> intercept the return value. It's the same rationale used every time we
>> advise programmers against using '*' as the targetOrigin for a
>> postMessage() invocation.
>
> That rationale only applies when you're going from window to window, not
> when you're going from iframe to parent.
>
>
>> >> The second problem with RPH is that the handler page doesn't have a
>> >> way of reliably getting the URL of the content to be handled from the
>> >> browser. In order to work in offline scenarios, the RPH handler must
>> >> put the %s placeholder in the fragment of its handler's URL.
>> >
>> > It's not clear to me that it makes sense to have an offline protocol
>> > handler. What kind of protocol do you have in mind?
>>
>> For example, consider an offline web mail program. I click on a mailto:
>> link and want to compose a message in my web mail editor, queuing it to
>> be sent next time I'm online.
>>
>> RPH is a way for a web page to send data to a user determined
>> application. There will surely be many scenarios where offline
>> functionality is desirable.
>
> For such an example, you can just use a fallback section in the appcache
> manifest. (Or a fragment identifier, indeed.)

Right, the obvious thing to do is use the fragment identifier, but
that's got some security problems. With a small tweak we can make this
safe and easy.

>> >> Unfortunately, this means that other content in the browser could
>> >> modify the content URL before the handler reads it.
>> >
>> > Well, any content can load any URL, so it doesn't matter whether the
>> > URL is in the fragment identifier or the path or anything else,
>> > surely.
>>
>> It matters if the handler page assumes that the URL came from its parent
>> or opener. The parent and opener then engage in a postMessage
>> conversation where the parent knows it said one thing, but the handler
>> heard it saying something different, something chosen by the attacker.
>
> Why would a mail client talk back to its opener?

It might not, but some RPH handlers will. They've got a postMessage
API; they're going to use it. Let's make sure its possible to use it
safely.

>> >> The intent handler sees a request coming from the victim page, but
>> >> with a content URL specified by the attacker. A related problem is
>> >> that the intent handler has no way to distinguish whether its URL was
>> >> loaded via the browser's RPH handling, or whether the client page
>> >> directly navigated to the intent handler's URL. Both of these
>> >> problems could be fixed by adding another readonly DOMString to the
>> >> API that contains the %s data for the RPH invocation.
>> >
>> > I don't understand why it matters how the URL was invoked.
>>
>> If the URL was invoked via RPH, then the handler page knows that the
>> user selected it for this action. The handler page also knows that any
>> arguments in the handler's URL (not in the RPH URL), were set by the
>> handler's origin and were not tampered with by the client page.
>>
>> For example, a web mail program might have two registered RPH handlers
>> for mailto: "https://example.org/?from=me at company&q=%s" and
>> "https://example.org/?from=me at personal&q=%s". The user has configured
>> their browser to send mailto links to their personal email editor. A
>> malicious client page could directly open the URL for the company email
>> editor. The web mail editor needs a way to detect when a client page is
>> trying to subvert the user's chosen preferences. So, an RPH handler
>> needs a way to know that it was loaded via the RPH dispatch. Once it
>> knows this, it can also trust that the arguments in the URL, such as
>> "from" in this case, were not tampered with by the client page.
>
> I don't understand the attack scenario. Sure, a Web page can open another
> Web page with arbitrary arguments. Why does it matter here?

Two reasons:
1. An RPH dispatch is different from a direct load because it
communicates a user choice to the RPH handler. I explained above how a
handler might use this information.
2. An RPH dispatch comes from the browser, so URL parameters can be
trusted; whereas they cannot be trusted in a load from another web
page.

With a small change, we can prevent a client page from faking an RPH
dispatch to a handler page.

--Tyler
Received on Friday, 6 April 2012 15:01:32 UTC