[whatwg] register*Handler and Web Intents from Greg Billock on 2012-04-20 (public-whatwg-archive@w3.org from April 2012)

From: Greg Billock <gbillock@google.com>
Date: Fri, 20 Apr 2012 10:35:40 -0700
Message-ID: <CAAxVY9e4mhCEihXomUiD9hRcvWWH20Thh68fSaDb08iPx-iqYQ@mail.gmail.com>
Ian,

As you can tell by the delay, we've (James Hawkins, Paul Kinlan,
myself, others working on web intents for Chromium) been carefully
reading your email and talking about the issues you bring up.

I think we agree on most things, except for some small but important points.

Considering RPH, RCH, and web intents all part of the same feature is
a good plan. Even if the APIs are different (but parallel), having
users able to think of them the same way is the right track. That is,
the UA presentation of the features should be indistinguishable so
that users can leverage familiarity with UI models of permission
grants, manipulating defaults and installed options, and make correct
attribution judgments easily when the features are used.

On Mon, Apr 2, 2012 at 4:23 PM, Ian Hickson <ian at hixie.ch> wrote:
> Looking at the three features, it seems they break down as follows:
>
> ? a handler registered using registerContentHandler() triggers when a URL
> ? with a particular type is opened, and results in the URL being passed
> ? to another URL that is opened.
>
> ? a handler registered using registerProtocolHandler() triggers when a
> ? URL with a particular scheme is opened, and results in the URL being
> ? passed to another URL that is opened.
>
> ? a handler registered using Web Intents triggers when a method is
> ? invoked on another page, and results in a URL being opened and its
> ? JavaScript context being given the information passed to the method.

For RPH, agreed that passing the URL is pretty much the only possible
approach. For RCH, web intents allows us to do better than this: we
can pass the content directly, in a Blob, rather than passing the URL,
thus decoupling the service from the (possibly sensitive) URL from
which the intent was triggered. That isn't always the right plan --
for feed URLs, passing the URL is an important feature enabler for a
feed reader to deal with the content. Anyway, mostly pointing out that
considering these together is a vehicle for more fine-grained control
of the coupling. If anything, this intertwines them more closely.

> Thus we reach a point where we can describe all three as a common set of registration features:

Agreed. This seems like a big win -- considering the registrations as
different potential capabilities of the same service feels like a much
simpler scenario for users.


> My suggestion then would be to add an element similar to what you suggest,
> as well as an API similar to the existing one.
>
> The element could be something like:
>
> ? <intent
> ? ? action="edit" ? ? intent action, e.g. open or edit, default "share"
> ? ? type="image/png" ?MIME type filter [1], default "*/*"
> ? ? scheme="mailto" ? Scheme filter [1] [2], default omitted
> ? ? href="" ? ? ? ? ? Handler URL [2], default ""
> ? ? title="Foo" ? ? ? Handler user-visible name, required attribute
> ? ? disposition="" ? ?"replace", "new", or "overlay", default "overlay"
> ? ></intent>
>
> [1] Only one of type="" and scheme="" is allowed.
> [2] scheme="" is only allowed if href="" contains %s.
>
> The API could be something like:
>
> ?void registerIntentHandler(DOMString action, DOMString type, DOMString url, DOMString title, DOMString disposition);
> ?DOMString isIntentHandlerRegistered(DOMString action, DOMString type, DOMString url);
> ?void unregisterIntentHandler(DOMString action, DOMString type, DOMString url);
>
> The disposition of registerContentHandler() and registerProtocolHandler()
> would always be "replace". The /url/ argument of registerProtocolHandler()
> would not be allowed to contain %s.
>
> A handler, once registered, would remain so until it was explicitly
> removed with unregisterIntentHandler() or removed by the user, as now for
> the other handler APIs; or, for registrations done with the declarative
> form, would remain until the user returns to the same page and the page
> returns a 200, 404, or 410 response (at which point it would be
> unregistered until such time as the <intent> elment is seen again, which
> could happen that very same page load).

This all sounds good. The argument about unregistration is what really
compels the imperative API, I think. Allowing (same-origin) pages to
unregister handlers imperatively is key to reliably being able to not
require a failed intent dispatch to a stale URL to unregister it.
(Otherwise it'd be too easy to end up compelling the full
registration-checking protocol on basically every page load to see if
the absence of an <intent> tag means deregistration.)

Another nicety is that RPH/RCH handlers can be invoked imperatively
with navigator.startActivity.

Our remaining discomfort here is with isIntentHandlerRegistered(), and
for similar reasons to the fingerprinting qualities of
isProtocolHandlerRegistered(). That is, we'd prefer that web content
simply call registerProtocolHandler blindly, similar to what a
declarative registration would do, and let the UA sort out whether the
user ought to be shown any kind of registration UI.

This does, however, impose some burden on clients to these systems.
They'd prefer to know that at least something is on the other side of
a content load or protocol link or web intents invocation. For web
intents, we're proposing that a set of suggested services could be
attached to the intent invocation, so if developers are worried about
this, they can attach suggestions which the UA may then regard as
hints if no qualifying filter matches the intent invocation.

There are several other compromises that might be appealing that cover
various use cases, but they all come with significant fingerprinting
cost. For instance, restricting isXxxRegistered to same-origin
handlers sounds promising, but there's a (weak) super-cookie lurking
there with faked handlers that include identifying data in the url.
We've considered a "bool isIntentRegistered(action, type)" function
signature, which would simply tell the caller whether anything might
happen when invoking that url. We think providing a suggested service
(which effectively means the result of such an API would always be
"true") is a better mechanism that eliminates fingerprinting.

> From a purely spec-editorial perspective, it seems to make more sense to
> have all of this in one spec, rather than split across multiple specs. If
> you would like, I'd be willing to spec this all in the HTML spec (which
> would especially make sense if we do add another element); alternatively,
> we should really consider moving the existing register*Handler() stuff to
> the same spec as the intent stuff.

Putting this in the HTML spec sounds like a great plan to us. As you
point out, there's a lot of spec work to be done to really get it well
defined, so we don't want you to end up saddled with a bunch of new
work if you're already quite busy. (Just a theory. :-)) If there's a
good way we can still contribute with that integration done, we're
eager to do so.

[Re: passing URLs to the server in RPH/RCH]
> Both have strong use cases. I think we should support both. In the case of
> data being cloned, it doesn't make much sense to upload it, so naturally
> that would just be provided client-side, as described above. For URLs,
> though, the opposite is the case -- you will usually want to fetch the URL
> somehow, which is almost always going to require work on the server side
> since the client typically won't have access rights to obtain the data
> (for content handlers) or open the connection (for protocol handlers).

Good point. There's a vehicle for client-side passing to get to the
server, but that's a bit inconvenient, especially for RPH. For RCH, as
I said above, I think there's a good rationale to keep the URL
undisclosed in the service in some cases, so it may make more sense to
do it client-side there.

>> [James Hawkins]:
>> Wildcard matching. ?R*H does not allow wildcard matching, where as Web
>> Intents would allow a service to register for image/* in one succinct
>> registration.
>
> I don't think wildcard matching really makes sense. In particular, I'm not
> aware of any service that can honestly say it supports image/*, or indeed
> any other topleveltype/*.

One exception would be for "save" type intents, where pretty much any
type can be dealt with. Another is where the handler is using say
<img> or <canvas>, and would like to specify accepted types in an
open-ended way.

At the DAP meeting, we agreed to extend this system to include
string-literal types, which provides a way to do good integration with
microdata types. There we expect to do exact string matching, and it
is true the eliminating wildcard types would bring these two type
namespaces a bit closer. MIME is complex enough that it would still
have to be treated separately, however. (Parameters and all that.)

[Re: possible <intent> injection attacks]
> There's also the following somewhat artificial attack scenario:
>
> ?1. User is tricked into going to victim.example.com. An injection attack
> ? ?is used to register a custom handler for vicitim.example.com. Not
> ? ?knowing that the user is at victim.example.com, and given a misleading
> ? ?handler title and so forth, the user adds the handler, thinking it is
> ? ?something else.
> ?2. User is tricked into going to some other site that invokes the
> ? ?handler.
> ?3. The user, thinking the handler is something else, picks it (it would
> ? ?probably be the only such handler in this scenario, with the action
> ? ?being a unique one for the attack).
> ?4. The user, confused as to which site he is visiting, performs an action
> ? ?on the victim site, thinking he is on some other site. Maybe the site
> ? ?is made to appear like another via some injected CSS. (Hey, we're
> ? ?assuming the site is susceptible to injection in the first place.)
>
> A lot has to go wrong for this to really happen, though.

Yes, but this more the the phishing attack mediated with intents, not
an attack on a third-party service page via intents. Phishing is
difficult, of course. One way we have thought to deal with that is to
make sure the URL bar displays the url, ssl state, etc. of the service
page while it is active. For a "window" disposition in a new tab, this
should happen naturally. While "inline" in an overlapped UI context,
though, this means the url bar would be displaying information
relevant to the overlapped UI. Finding the correct solution to this is
really important, as these overlapped non-spoofable UIs look to be
very attractive to application developers because of their
redress-proof qualities.

In other words, while intents may make some security problems more
tractable by adding an interesting UI element, and interpose a hurdle
of user-mediated installation to some attack vectors, they don't
really get rid of phishing opportunities, or allow app developers to
be less vigilant about XSS type issues.

> Yes, I see no reason to allow cross-origin registration. The existing
> methods do not allow that.

Agreed.

> As part of replying to this e-mail, I also reviewed the existing Web
> Intents spec. Here are some comments on it. I hope they are helpful.

YES! Thanks.

> - Nothing seems to ever actually invoke the structured clone algorithm, so
> it's unclear when that should run. In particular, I don't understand when
> ports get cloned. Is it in the constructor? In startActivity?
>
> - What does it mean for a member of Intent to only be present at certain
> times? (e.g. "Only present when the Intent object is delivered to the
> Service page")
>
> - A lot of the spec seems to be lacking in formal requirements; it just
> describes what happens but doesn't actually require it.

We're busy fleshing out many parts, but there is much that is left up
to the user agent. There are definitely holes with how and where
structured clones are produced, how type matching works, and how
service registration and deregistration works. I'm actively working on
them, and feedback like this is very helpful!

>
> - The spec requires that the interfaces that the Window object (called
> "DOMWindow" in the spec for some reason) implements depend on the markup
> in the page. This makes no sense, since the markup in the page isn't known
> at the time the interfaces are prepared, and even if they were, the page's
> content can change dynamically with elements being added and removed from
> script randomly.

Interface elements can be dynamically created. Our initial reasoning
behind using window.intent for delivery is that it is very simple to
use, is unspoofable, has a semantics that forbids redelivery, making
the relationship of the service page context and a new intent clear,
and that it can be present from the instant Javascript is run on the
page, and so available to <head> scripts.

Since making that choice, we've been persuaded that redelivery ought
not be ruled out as a use case, and that delivery ought to be gated on
signals on the page that indicate continued willingness to accept the
intent the UA was invoked to deliver. We still like the simplicity of
window.intent, and we're planning a proposal for a redelivery
adaptation that would run a message delivery in the object context of
window.intent (a la window.event/evt).

>
> - Using URLs as intents, especially for the default intents, is overly
> verbose. I highly recommend just having a wiki page be a registry of
> widely used intents, and saying that if people want specialised ones for
> their own communities, they can then use URLs, but otherwise it's fine to
> just use simple identifiers like "edit" or "share", so long as they are
> registered in the wiki. This is what we're doing with rel="" and it seems
> to work fine.

We want the action names to be opaque strings, but getting the
convention right is very important. The attractive thing about URLs is
that they are namespaced well, and can be self-documenting. When
considering type literals, we'd like those to be able to match
microdata type strings for interoperability reasons. Using URLs there
matches microdata types. That feels like pressure on action strings to
adopt the same convention. If what we consider to be the most common
action strings are simple ("edit", "share", "pick"), will they be
confusingly different from namespaced action and type strings?

-Greg Billock
Received on Friday, 20 April 2012 10:35:40 UTC