Re: Cross Origin Web Components: Fixing iframes

On Wed, Dec 4, 2013 at 11:45 AM, Ryosuke Niwa <rniwa@apple.com> wrote:

>
> On Nov 26, 2013, at 10:15 PM, Dominic Cooney <dominicc@google.com> wrote:
>
> On Wed, Nov 27, 2013 at 2:19 PM, Ryosuke Niwa <rniwa@apple.com> wrote:
>
>>
>> On Nov 27, 2013, at 8:57 AM, Dominic Cooney <dominicc@google.com> wrote:
>>
>> On Tue, Nov 26, 2013 at 2:03 PM, Ryosuke Niwa <rniwa@apple.com> wrote:
>>
>>> Hi,
>>>
>>> I have been having informal discussions of our earlier proposal for
>>> cross-orign use cases and declarative syntax for web components, and I
>>> realized there was a lot of confusion about our motivations and decision
>>> decisions.  So I wanted to explain why/how we came up that proposal in this
>>> email.
>>>
>>>
>>> *Problem*: A lot of websites embed SNS widgets, increasing the security
>>> surface of embedders.  The old version of techcrunch.com, for example,
>>> had 5+ social share buttons on each article.  If any one of those SNS
>>> websites got compromised, then the embedder will also get compromised.
>>>
>>
>> This is a valid problem. Does anyone have related use cases that might be
>> in-scope for this discussion?
>>
>>
>> Comment forms (e.g. DISQUS) is another important use case.
>>
>> *What if we used iframe?*
>>> What if we replaced each such instance with an iframe?  That would give
>>> us a security boundary.
>>>
>>> On the other hand, using an iframe for each social button is very
>>> expensive because each iframe loads a document, creates its own security
>>> origin, JS global object, and so forth. Initializing new script context
>>> (a.k.a. "VM", "world", "isolate", etc…) for every single SNS widget on a
>>> page is quite expensive.  If we had 10 articles, and each article had 5
>>> social buttons, we'll have 50 iframes, each of which needs to load
>>> megabytes of JavaScript.
>>>
>>> iframe is also heavily restricted in terms of its ability to layout
>>> itself. Comment widgets (e.g. DISQUS) for example need to stretch
>>> themselves to the height of its content.
>>>
>>> We also need a better mechanism to pass arguments and communicate with
>>> cross-origin frames than postMessage.
>>>
>>>
>>> *What if we made iframe lighter & used seamless iframe?*
>>> The cost of iframe could be reduced substantially if we cached and
>>> internally shared each page's JavaScript.  However, we still have to
>>> instantiate its own script context, document, and window objects.
>>>
>>> We can also use seamless iframe to address the comment widget use case.
>>>
>>>
>>> *What if we let each iframe create multiple "views"?*
>>> The problem with using an iframe for a cross-origin widget is that each
>>> iframe creates its own document, window, etc… even if there are multiple
>>> widgets from the same origin.  e.g. if we had a tweet button on 10
>>> different articles, we have to create its own document ,window, etc… for
>>> each tweet button.
>>>
>>> We can reduce this cost if we could share the single frame, and have it
>>> render multiple "views".  Naturally, each such view will be represented as
>>> a separate DOM tree.  In this model, a single iframe owns multiple DOM
>>> trees, each of which will be displayed at different locations in the host
>>> document.  Each such a DOM tree is inaccessible from the host document, and
>>> the host document is inaccessible from the iframe.
>>>
>>> This model dramatically reduces the cost of having multiple widgets from
>>> the same origin.  e.g. if we have 10 instances of widgets from 5 different
>>> social networks, then we'll have only 5 iframes (each of which will have 10
>>> "views") as opposed to 50 of them.
>>>
>>>
>>> *What if we provided a declarative syntax to create such a view?*
>>> Providing a better API proved to be challenging.  We could have let page
>>> authors register a custom element for each cross-origin widget but that
>>> would mean that page authors have to write a lot of script just to embed
>>> some third-party widgets.  We need some declarative syntax to let authors
>>> wrap an iframe.
>>>
>>> Furthermore, if we wanted to use the multiple-views-per-iframe, then
>>> we'll need a mechanism to declare where each instance of such a view is
>>> placed in the host document with arguments/configuration options for each
>>> view.
>>>
>>> A custom element seemed like a natural fit for this task but the
>>> prototype/element object cannot be instantiated in the host document since
>>> the cross-origin widgets' script can't run in the host document and
>>> prototype objects, etc… cannot be shared between the host document and the
>>> shared iframes.  So we'll need some mechanism for the shared iframe to
>>> define custom element names, and have the host document explicitly import
>>> them as needed.
>>>
>>>
>>> At this point, the set of features we needed looked very similar to the
>>> existing custom element and shadow DOM.  Each "view" of the shared iframe
>>> was basically a shadow DOM with a security boundary sitting between the
>>> host element and the shadow root.  The declarative syntax for the "view"
>>> was basically a declarative syntax of a custom element that happens to
>>> instantiate a shadow DOM with a caveat that the shadow host is inaccessible
>>> form the component, and the shadow DOM is inaccessible from the host
>>> document.  It also seemed natural for such an "shared iframe" to be loaded
>>> using HTML imports.
>>>
>>>
>>> You can think of our proposal as breaking iframe down into two pieces:
>>>
>>>    1. Creating a new document/window
>>>    2. Creating a new view
>>>
>>> I think decomposing the problem this way is a good step.
>>
>> Re: creating a new document/window, purely in terms of *mechanics*,
>> IFRAME does this already. Is anything else required?
>>
>>
>> The problem is that iframe does both 1 and 2 but I agree that iframe
>> already provides this mechanism if we set style=display:none.  But it would
>> be really ugly and cumbersome if we had to import various SNS widgets with
>> iframe with style set to display:none.
>>
>
> Right. I think it will help us divide and conquer the problem if we can
> work on (a) API aesthetics, and separately (b) mechanics in terms of as
> much existing stuff as possible (IFRAME, viewport, etc.) I don't
> necessarily mean taking that existing stuff as-is, but maybe pulling chunks
> out of the existing stuff and specing it so it will explain the legacy
> stuff and work in these new combinations for this new use case.
>
>
> Yeah, that makes sense.
>
>  Re: creating a new view, this is really interesting to me. It seems
>> there are a few different parts, I think most of these are needed for the
>> use case above; I've also noted where we might break out and "explain" some
>> existing part of the platform.
>>
>> - Arranging the rendering of a DOM (sub)tree into a "view". IFRAME,
>> ShadowRoot and indeed just "rendering in general" do this.
>> - Arranging the rendering of something else into a "view". Replaced
>> elements like OBJECT and IMG do this. Maybe this is just trivially "arrange
>> the rendering of a DOM containing CANVAS" though.
>> - Communicating or blocking layout across the "view" boundary. Cases
>> where information flows outside-in: the viewport-document relationship;
>> IFRAME. Cases where information flows two ways: seamless IFRAME, Shadow
>> DOM, layout in general.
>> - Something about laying things out/rendering outside the bounds of the
>> "view". Shadow DOM and does this (you can rel/abs/fixed position stuff
>> outside of the host element bounds.) This is a tricky one... in scope or
>> does Shadow DOM remain a special case? Would some embedders trust a
>> component enough to let them clickjack them, just not steal their cookies,
>> etc.?
>>
>>
>> Right.  We need to add something like overflow: clip by default to
>> prevent click hijacking.
>>
>
> Is overflow: clip sufficient?
>
> If we're trying to map this to primitives, does this mean that the UA
> stylesheet has a high specificity rule which says "if you're one of these
> elements entangled with a viewport, overflow: clip"?
>
> I note that fb:like has a "flyout". Do you think it is a reasonable use
> case? Should the component author be allowed to detect when their
> view-thing will clip them or not?
>
>>  and providing a mechanism to do 2 without doing 1 (or that doing
>>> 2 multiple times after doing 1 once), and making it usable with a
>>> declarative syntax.
>>>
>>
>> This definitely deserves to be bullet 3--usable with declarative syntax.
>>
>> To clarify that I understand--the importance of succinct declarative
>> syntax is so that the embedder doesn't end up including the "shim" script
>> for Foo's widget from foo.com, which means trusting foo.com which was
>> the whole point! Right?
>>
>>
>> Right.  Using Foo widget from foo.com should NOT involve running scripts
>> from foo.com in the host document.
>>
>> It would be nice if we could solve this problem in a layered way. For
>> example, I think the "view" stuff above is a lower-level primitive, and the
>> declarative syntax should be explained in terms of (something for getting a
>> window+document--IFRAME?) plus "view" plus (extremely small alpha that
>> explains how the stuff is wired up.)
>>
>>
>> That makes sense although we haven't come up with use cases where we just
>> want to use the multiple "views" cross-origin without the declarative
>> syntax.
>>
>
> What about the status quo, where the embedder trusts the component being
> embedded, but the component doesn't trust the embedder?
>
>
> Authors can keep using script elements for that use case.  Is there some
> existing problem we want to solve in that use case?
>

The performance problem you mentioned earlier with having multiple heavy
IFRAMEs.

> The component will be running scripts in the embedder's context (perhaps
> the component has script API built on postMessage to the IFRAME) but for
> efficiency its desirable to have one IFRAME for the multiple like buttons,
> etc.
>
>>  I guess it is OK if the API is not declarative on the widget side? If
>> we assume the widget enjoys the isolation of an IFRAME, is performance the
>> primary motivator on this side?
>>
>>
>> Being declarative will definitely benefit the performance because preload
>> scanner, etc… could detect what kind of "views" are exposed/implemented in
>> a given "slave" (or "widget") document without running scripts.
>>
>> Also, I'd imagine a lot of widgets would end up using templates so having
>> to manually instantiate those templates would be annoyance.
>>
>
> Having written some basic apps with Polymer, it's evidently feasible to
> wrap the template stamping up in a library.
>
>> It would be nice if the widget author could get something rendered very
>> quickly.
>>
>>
>> Right.
>>
>>  I think this "declarative" part of the problem breaks down this way:
>>
>> - How the page author "invokes" something in the embedded component. How
>> is it named and how does the author mention the name?
>>
>>
>> So I think a custom element is the natural mechanism. e.g.
>>
>> <import src="http://foo.com/widget.html" customelements="foo-button">
>> <foo-button>Foo this</foo-button>
>>
>
> I see the appeal of Custom Elements, because it has a way to define a name
> (document.register), mention a name (createElement, write markup, etc.) and
> has a model of instantiating elements. But it has baggage you don't want,
> like prototypes and constructors (on the embedder side.) There's also all
> the details of this viewport entangling. Likewise with HTML Imports,
> they've got some things you want (new document) but some things you don't
> (shared window, shared globals) and some things I'm unsure about
> (synchronous versus asynchronous).
>
>
> Right.  Perhaps we could either extract the common base of the custom
> elements.
>

Yes. This could also explain how the built-in elements come into existence.
We have:

- Built-in elements provided by the UA.
- Author-defined Custom Elements provided by the page.
- "Broker" elements provided by the UA by processing a definition on the
other side of an isolation boundary.


> Alternatively, we can provide a mechanism to auto-create custom elements
> as a wrapper for cross-origin widgets.  i.e. we want to have the imported
> document create a DOM tree given a name of tag/element, and then securely
> insert it somewhere in the host document as a custom element.
>

Could you explain this in more detail? I don't understand what you mean by
auto-creation.

My intuition about this problem is that element names are a very convenient
way for the page to "invoke" the abstraction. On the other side of the
isolation boundary, we need to have something as the root of the DOM tree
for an instance; probably an element or a ShadowRoot. But I'm not convinced
that the thing on the invoking side, and the thing on the invoked side,
have to be exactly the same object. I think starting with distinct objects
and explaining what information DOES flow between them (hopefully
relatively little) will make it easier to get the security properties we
want.


> The downside of this approach is that now authors have to deal with two
> ways of defining widgets/components for same origin and cross origin use
> cases.
>
> I don't immediately have any better ideas so this is the straw man for
> now. As we work through the details we might come up with some tweaks or
> alternatives.
>
> I guess that's another reason to sweat the small stuff--if we had
> prototype implementations of element-view entangling and so on we could
> polyfill some of the high level declarative syntax ideas and bounce
> prototypes off real web developers and use cases.
>
>
>> Note that we can't let the imported "slave" document define an arbitrary
>> set of custom elements by default.
>>
>> is=blah syntax isn't as useful/interesting here because it's unusual to
>> use a cross-origin widget to replace an existing built in HTML element.
>>
>>  - How does the embedding page understand that there's an "instance" of
>> their stuff contributing to the main page now?
>>
>>
>> Again, the custom element's created callback is a very nice mechanism for
>> that.
>>
>> - How does the author configure an instance from the embedded component?
>> Presumably the button needs to know something things from its embedder,
>> like API keys, etc.
>>
>>
>> If we decided that each "view" is a custom element, then a very natural
>> way for it to communicate the information is via data attributes.
>>
>
> I note that fb:like already uses data- so there's precedent for that.
>
>
> Right.
>
> Are there problems with data-?
>
>
> Not that I know of.
>

I suppose there's the risk that a page uses data- without expecting it to
leak to a third-party. On the other hand, the page will have to take some
explicit action to include one of these components, so that seems like an
acceptable risk to me.

> Where would the component access them? Can they see updates? Are updates
> one way or bi-directional?
>
>
> So if we had used shadow DOM as the security boundary, we can expose
> dataset on the shadow root, and have it sync'ed with data attributes on the
> shadow host.
>

Is there some way we can specify these updates as "best effort"? It would
be nice to (a) keep out-of-process stuff on the table, and (b) discourage
authors from building IPC-over-data-attributes.

Received on Thursday, 5 December 2013 01:17:12 UTC