Re: [webappsec] ISSUE-53: UISecurity input-protection heuristic for composited rendering from Brad Hill on 2013-10-14 (public-webappsec@w3.org from October 2013)

From: Brad Hill <hillbrad@gmail.com>
Date: Mon, 14 Oct 2013 16:45:03 -0700
To: David Lin-Shung Huang <linshung.huang@sv.cmu.edu>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAEeYn8iePYRbtinQGNW24LptvUNNNVff5nTuRkKvsBF3maWS_A@mail.gmail.com>
On Mon, Oct 14, 2013 at 3:38 PM, Brad Hill <hillbrad@gmail.com> wrote:

> So, there is no way to get the final rendering, even for the compositor
> thread managing the outermost document?  :/   You can't read the pixels
> back from the GPU when you know you have a hit to a protected region?
>

Continuing to explore my own question... does the implementation of the
screen capture facility of getUserMedia() provide a re-usable primitive
that we can point to?



> , 2013 at 6:54 PM, David Lin-Shung Huang <linshung.huang@sv.cmu.edu>wrote:
>
>> Thanks, Brad! For what it's worth, here's my attempt to clarify which
>> parts of the original description in Section 6.2 are affected by
>> compositing:
>>
>> - For "timing attacks countermeasure" (the "Display Change List"),
>> compositing should have no impact. Essentially, we're keeping track of the
>> "damage rects" that already exists in the main thread of WebKit (presumably
>> the browser we're concerned of here).
>>
>> - For "cursor sanity check", compositing has no impact. (Cursors are
>> independent of screenshots.)
>>
>> - For "obstruction check", there are two distinct cases:
>>   (1) When the "user image" is taken using OS-native APIs, compositing
>> should have no impact. (OS can grab the final rendering.)
>>   (2) When the "user image" is taken by the browser, obstruction checks
>> will fail (as Adam pointed out) since each browsing context has no
>> knowledge of the final rendering. [FIXME!]
>>
>> As a side benefit, I suspect that compositing might actually make
>> obstruction checks faster, because the "control image" could be available
>> as a cached layer (or tile), thus eliminating the need to render an
>> off-screen HTML5 canvas element.
>>
>>
>> Thanks,
>> David
>>
>>
>> On Thu, Oct 10, 2013 at 12:48 PM, Brad Hill <hillbrad@gmail.com> wrote:
>>
>>> <hat = editor>
>>>
>>> One of the open issues on UISecurity raised by Adam Barth is that the
>>> input-protection heuristic is not well-suited to browsers that use
>>> compositing to accelerate page rendering with GPUs.  While this heuristic
>>> is non-normative, Adam suggested that we should supply a heuristic for this
>>> model.
>>>
>>> I've attempted to grok these browser internals this week, at least for
>>> Blink and WebKit, and below is a first attempt at such a heuristic.  I
>>> would definitely appreciate review from anyone who feels qualified, or is
>>> willing to forward it to someone who is.  I'm confident that despite my
>>> best efforts I have somwhere confused what happens in layout vs. draw vs.
>>> paint vs. composite vs. render.
>>>
>>> thanks,
>>>
>>> Brad
>>>
>>> *Alternate Input Protection Heuristic for Multi-Layer Compositing*
>>>
>>> Some user agents, in order to improve performance by taking advantage of
>>> specialized graphics hardware, use a strategy for hit testing and
>>> delivering UI events to hardware-composited layers that the basic heuristic
>>> does not apply well to.  This alternative *non-normative* heuristic
>>> describes one possible implementation strategy for the input-protection
>>> directive in this architecture.
>>>
>>> GPU optimized user agents typically separate the browser UI process from
>>> the process that handles building and displaying the visual representation
>>> of the resource.  (In this context the term "process" refers to any
>>> encapsulated subunit of user-agent functionality that communicates to other
>>> similar subunits through message passing, without implying any particular
>>> implementation details such as locality to a thread, OS-level "processe" or
>>> the like.)  It is typical for the browser UI process to receive user events
>>> such as mouse clicks and then marshal these to the render process, where
>>> the event is hit tested through the page's DOM, checking for event handlers
>>> along the way.  As an optimization the render process may communicate hit
>>> test rectangles back to the UI process in advance so that the UI process
>>> can, e.g. immediately respond to a Touch event by scrolling if the event
>>> target falls within coordinates for which there are no other registered
>>> handlers in the DOM.   A similar strategy can be used to create an
>>> implementation of the input protection heuristic in a manner that is
>>> consistent with this multi-process, compositing architecture.
>>>
>>> If a resource is being loaded in a frame, iframe, object, embed or
>>> applet context and specifies an input-protection directive, apply the
>>> following steps:
>>>
>>> 1.       *Tracking hit test rects: * Hook the creation of event
>>> handlers for protected events and elements and add the DOM nodes with any
>>> such handler to a collection. After a layout occurs, or when an event
>>> handler is added or removed,iterate across all DOM nodes to generate a
>>> vector of rectangles where such events need to be marshaled to.  If the
>>> input-protection applies to the DOMWindow or Document node, avoid this
>>> expensive process of walking the renderers and simply use the view's
>>> bounds, as they're guaranteed to be inclusive.
>>>
>>> 2.       *(Optionally) Put the protected areas into a backing store /
>>> composited layer: *To avoid the expense of having to re-layout and
>>> re-paint protected regions during the *obstruction check*, it may make
>>> sense to designate and place protected regions into their own backing store
>>> or composited layer which can serve as a cached *control image*. Such a
>>> backing store should paint the entire content of the protected region for
>>> this purpose, even if it is clipped by the viewport.
>>>
>>> 3.       *Hit testing in the compositor: *When an event is received,
>>> check whether it is on any layer and then walk the layer hierarchy checking
>>> the protected regions on every layer.  If there is a hit, continue the
>>> heuristic.  Otherwise, exit this heuristic and event processing proceeds as
>>> normal.
>>>
>>> 4.       *Cursor sanity check:* By querying computed-style with the
>>> ":hover" pseudo-class on the element (if the target is plugin content) or
>>> on the host frame element and its ancestors (if the target is a nested
>>> document), check whether the cursor has been hidden or changed to a
>>> possibly attacker-provided bitmap: if it has, proceed to *Violation
>>> management*. This provides protection against "Phantom cursor" attacks,
>>> also known as "Cursorjacking".
>>>
>>> 5.       *Timing check: [ I need some help here ] *Conceptually, we
>>> would like to track whenever a protected region must be *redrawn
>>> (--show-property-changed-rects,I think, is the Blink concept) *AND when
>>> the cause of that redraw originated from a different document context.  We
>>> want to trigger the heuristic if an enclosing frame overlapped the
>>> protected region within the specified time interval, but we don't want to
>>> trigger if redraws originate from within the same document context as the
>>> protected area. (e.g. if the button itself has a mouseover animation)  I'm
>>> really not sure how this part works or how to describe it in generic terms
>>> here.  Can we propagate the source and timing of redraw triggers to the
>>> protected hit test rects or our collection of DOM nodes?
>>>
>>> 6.       *Obstruction check: *Compare two sets of pixels: the *control
>>> image *is the protected area as if it was rendered alone, unobstructed
>>> by pixels originating from any other document context.  (If step 2
>>> optimizations were performed, this should be readily available in its own
>>> composited layer.)  The *user image *represents the same area as the *control
>>> image* in the outermost document's coordinate system and contains the
>>> final set of common pixels for the fully rendered page. These images are
>>> compared, and if the differences are below the *tolerance* threshold
>>> associated with the input-protection directive, proceed to deliver the
>>> event normally, otherwise proceed to *Violation management*. If
>>> portions of the *control image *are clipped by the root view port in
>>> the outermost document's coordinate system, all such pixels must be
>>> considered not to match.  [*I don't know enough to say whether the
>>> comparison can be done on the GPU without marshaling the pixels of the
>>> control image backing store back to software, or if this is even worth
>>> mentioning here…]*
>>>
>>>
>>>
>>> Notes: applying protections to the entire document (if it itself would
>>> consist of multiple composited layers) or using the input-protection-clip
>>> property may make many of these optimizations impossible, and may imposes
>>> performance penalties on the page, perhaps forcing it to fallback to
>>> all-software rendering.  We should have text warning authors of this, or
>>> should we simply remove those options from the spec and require
>>> input-protection-selectors only, to better match the internal strategies of
>>> modern browser rendering?
>>>
>>>
>>>
>>
>>
>
Received on Monday, 14 October 2013 23:45:32 UTC