[webappsec] ISSUE-53: UISecurity input-protection heuristic for composited rendering from Brad Hill on 2013-10-10 (public-webappsec@w3.org from October 2013)

From: Brad Hill <hillbrad@gmail.com>
Date: Thu, 10 Oct 2013 14:48:18 -0500
To: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAEeYn8i+BBs4bpXC1Fj5RD7xcJfGB2vvKBCHyemPuzMdRaS6TQ@mail.gmail.com>
<hat = editor>

One of the open issues on UISecurity raised by Adam Barth is that the
input-protection heuristic is not well-suited to browsers that use
compositing to accelerate page rendering with GPUs.  While this heuristic
is non-normative, Adam suggested that we should supply a heuristic for this
model.

I've attempted to grok these browser internals this week, at least for
Blink and WebKit, and below is a first attempt at such a heuristic.  I
would definitely appreciate review from anyone who feels qualified, or is
willing to forward it to someone who is.  I'm confident that despite my
best efforts I have somwhere confused what happens in layout vs. draw vs.
paint vs. composite vs. render.

thanks,

Brad

*Alternate Input Protection Heuristic for Multi-Layer Compositing*

Some user agents, in order to improve performance by taking advantage of
specialized graphics hardware, use a strategy for hit testing and
delivering UI events to hardware-composited layers that the basic heuristic
does not apply well to.  This alternative *non-normative* heuristic
describes one possible implementation strategy for the input-protection
directive in this architecture.

GPU optimized user agents typically separate the browser UI process from
the process that handles building and displaying the visual representation
of the resource.  (In this context the term "process" refers to any
encapsulated subunit of user-agent functionality that communicates to other
similar subunits through message passing, without implying any particular
implementation details such as locality to a thread, OS-level "processe" or
the like.)  It is typical for the browser UI process to receive user events
such as mouse clicks and then marshal these to the render process, where
the event is hit tested through the page's DOM, checking for event handlers
along the way.  As an optimization the render process may communicate hit
test rectangles back to the UI process in advance so that the UI process
can, e.g. immediately respond to a Touch event by scrolling if the event
target falls within coordinates for which there are no other registered
handlers in the DOM.   A similar strategy can be used to create an
implementation of the input protection heuristic in a manner that is
consistent with this multi-process, compositing architecture.

If a resource is being loaded in a frame, iframe, object, embed or applet
context and specifies an input-protection directive, apply the following
steps:

1.       *Tracking hit test rects: * Hook the creation of event handlers
for protected events and elements and add the DOM nodes with any such
handler to a collection. After a layout occurs, or when an event handler is
added or removed,iterate across all DOM nodes to generate a vector of
rectangles where such events need to be marshaled to.  If the
input-protection applies to the DOMWindow or Document node, avoid this
expensive process of walking the renderers and simply use the view's
bounds, as they're guaranteed to be inclusive.

2.       *(Optionally) Put the protected areas into a backing store /
composited layer: *To avoid the expense of having to re-layout and re-paint
protected regions during the *obstruction check*, it may make sense to
designate and place protected regions into their own backing store or
composited layer which can serve as a cached *control image*. Such a
backing store should paint the entire content of the protected region for
this purpose, even if it is clipped by the viewport.

3.       *Hit testing in the compositor: *When an event is received, check
whether it is on any layer and then walk the layer hierarchy checking the
protected regions on every layer.  If there is a hit, continue the
heuristic.  Otherwise, exit this heuristic and event processing proceeds as
normal.

4.       *Cursor sanity check:* By querying computed-style with the
":hover" pseudo-class on the element (if the target is plugin content) or
on the host frame element and its ancestors (if the target is a nested
document), check whether the cursor has been hidden or changed to a
possibly attacker-provided bitmap: if it has, proceed to *Violation
management*. This provides protection against "Phantom cursor" attacks,
also known as "Cursorjacking".

5.       *Timing check: [ I need some help here ] *Conceptually, we would
like to track whenever a protected region must be *redrawn
(--show-property-changed-rects,I think, is the Blink concept) *AND when the
cause of that redraw originated from a different document context.  We want
to trigger the heuristic if an enclosing frame overlapped the protected
region within the specified time interval, but we don't want to trigger if
redraws originate from within the same document context as the protected
area. (e.g. if the button itself has a mouseover animation)  I'm really not
sure how this part works or how to describe it in generic terms here.  Can
we propagate the source and timing of redraw triggers to the protected hit
test rects or our collection of DOM nodes?

6.       *Obstruction check: *Compare two sets of pixels: the *control image
*is the protected area as if it was rendered alone, unobstructed by pixels
originating from any other document context.  (If step 2 optimizations were
performed, this should be readily available in its own composited layer.)
The *user image *represents the same area as the *control image* in the
outermost document's coordinate system and contains the final set of common
pixels for the fully rendered page. These images are compared, and if the
differences are below the *tolerance* threshold associated with the
input-protection directive, proceed to deliver the event normally,
otherwise proceed to *Violation management*. If portions of the *control
image *are clipped by the root view port in the outermost document's
coordinate system, all such pixels must be considered not to match.  [*I
don't know enough to say whether the comparison can be done on the GPU
without marshaling the pixels of the control image backing store back to
software, or if this is even worth mentioning here…]*



Notes: applying protections to the entire document (if it itself would
consist of multiple composited layers) or using the input-protection-clip
property may make many of these optimizations impossible, and may imposes
performance penalties on the page, perhaps forcing it to fallback to
all-software rendering.  We should have text warning authors of this, or
should we simply remove those options from the spec and require
input-protection-selectors only, to better match the internal strategies of
modern browser rendering?
Received on Thursday, 10 October 2013 19:48:46 UTC