Re: [webappsec] ISSUE-53: UISecurity input-protection heuristic for composited rendering from David Lin-Shung Huang on 2013-10-15 (public-webappsec@w3.org from October 2013)

From: David Lin-Shung Huang <linshung.huang@sv.cmu.edu>
Date: Tue, 15 Oct 2013 15:11:53 -0700
To: Brad Hill <hillbrad@gmail.com>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAGiwpwjibJ43+zm_G02bULnfN0FCZ_6y=GU2Suav16iDdUR8hw@mail.gmail.com>
Regarding WebRTC desktop/screen sharing implementations, I see that screen
capturing in chromium is done with OS-native APIs (BitBlt for Windows).
https://code.google.com/p/webrtc/source/browse/trunk/webrtc/modules/desktop_capture/screen_capturer_win.cc

For chromium, looks like their ScreenCapturer interface will be reusable
https://code.google.com/p/chromium/issues/detail?id=180360



On Mon, Oct 14, 2013 at 4:45 PM, Brad Hill <hillbrad@gmail.com> wrote:

>
>
> On Mon, Oct 14, 2013 at 3:38 PM, Brad Hill <hillbrad@gmail.com> wrote:
>
>> So, there is no way to get the final rendering, even for the compositor
>> thread managing the outermost document?  :/   You can't read the pixels
>> back from the GPU when you know you have a hit to a protected region?
>>
>
> Continuing to explore my own question... does the implementation of the
> screen capture facility of getUserMedia() provide a re-usable primitive
> that we can point to?
>
>
>
>> , 2013 at 6:54 PM, David Lin-Shung Huang <linshung.huang@sv.cmu.edu>wrote:
>>
>>> Thanks, Brad! For what it's worth, here's my attempt to clarify which
>>> parts of the original description in Section 6.2 are affected by
>>> compositing:
>>>
>>> - For "timing attacks countermeasure" (the "Display Change List"),
>>> compositing should have no impact. Essentially, we're keeping track of the
>>> "damage rects" that already exists in the main thread of WebKit (presumably
>>> the browser we're concerned of here).
>>>
>>> - For "cursor sanity check", compositing has no impact. (Cursors are
>>> independent of screenshots.)
>>>
>>> - For "obstruction check", there are two distinct cases:
>>>   (1) When the "user image" is taken using OS-native APIs, compositing
>>> should have no impact. (OS can grab the final rendering.)
>>>   (2) When the "user image" is taken by the browser, obstruction checks
>>> will fail (as Adam pointed out) since each browsing context has no
>>> knowledge of the final rendering. [FIXME!]
>>>
>>> As a side benefit, I suspect that compositing might actually make
>>> obstruction checks faster, because the "control image" could be available
>>> as a cached layer (or tile), thus eliminating the need to render an
>>> off-screen HTML5 canvas element.
>>>
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On Thu, Oct 10, 2013 at 12:48 PM, Brad Hill <hillbrad@gmail.com> wrote:
>>>
>>>> <hat = editor>
>>>>
>>>> One of the open issues on UISecurity raised by Adam Barth is that the
>>>> input-protection heuristic is not well-suited to browsers that use
>>>> compositing to accelerate page rendering with GPUs.  While this heuristic
>>>> is non-normative, Adam suggested that we should supply a heuristic for this
>>>> model.
>>>>
>>>> I've attempted to grok these browser internals this week, at least for
>>>> Blink and WebKit, and below is a first attempt at such a heuristic.  I
>>>> would definitely appreciate review from anyone who feels qualified, or is
>>>> willing to forward it to someone who is.  I'm confident that despite my
>>>> best efforts I have somwhere confused what happens in layout vs. draw vs.
>>>> paint vs. composite vs. render.
>>>>
>>>> thanks,
>>>>
>>>> Brad
>>>>
>>>> *Alternate Input Protection Heuristic for Multi-Layer Compositing*
>>>>
>>>> Some user agents, in order to improve performance by taking advantage
>>>> of specialized graphics hardware, use a strategy for hit testing and
>>>> delivering UI events to hardware-composited layers that the basic heuristic
>>>> does not apply well to.  This alternative *non-normative* heuristic
>>>> describes one possible implementation strategy for the input-protection
>>>> directive in this architecture.
>>>>
>>>> GPU optimized user agents typically separate the browser UI process
>>>> from the process that handles building and displaying the visual
>>>> representation of the resource.  (In this context the term "process" refers
>>>> to any encapsulated subunit of user-agent functionality that communicates
>>>> to other similar subunits through message passing, without implying any
>>>> particular implementation details such as locality to a thread, OS-level
>>>> "processe" or the like.)  It is typical for the browser UI process to
>>>> receive user events such as mouse clicks and then marshal these to the
>>>> render process, where the event is hit tested through the page's DOM,
>>>> checking for event handlers along the way.  As an optimization the render
>>>> process may communicate hit test rectangles back to the UI process in
>>>> advance so that the UI process can, e.g. immediately respond to a Touch
>>>> event by scrolling if the event target falls within coordinates for which
>>>> there are no other registered handlers in the DOM.   A similar strategy can
>>>> be used to create an implementation of the input protection heuristic in a
>>>> manner that is consistent with this multi-process, compositing architecture.
>>>>
>>>> If a resource is being loaded in a frame, iframe, object, embed or
>>>> applet context and specifies an input-protection directive, apply the
>>>> following steps:
>>>>
>>>> 1.       *Tracking hit test rects: * Hook the creation of event
>>>> handlers for protected events and elements and add the DOM nodes with any
>>>> such handler to a collection. After a layout occurs, or when an event
>>>> handler is added or removed,iterate across all DOM nodes to generate a
>>>> vector of rectangles where such events need to be marshaled to.  If the
>>>> input-protection applies to the DOMWindow or Document node, avoid this
>>>> expensive process of walking the renderers and simply use the view's
>>>> bounds, as they're guaranteed to be inclusive.
>>>>
>>>> 2.       *(Optionally) Put the protected areas into a backing store /
>>>> composited layer: *To avoid the expense of having to re-layout and
>>>> re-paint protected regions during the *obstruction check*, it may make
>>>> sense to designate and place protected regions into their own backing store
>>>> or composited layer which can serve as a cached *control image*. Such
>>>> a backing store should paint the entire content of the protected region for
>>>> this purpose, even if it is clipped by the viewport.
>>>>
>>>> 3.       *Hit testing in the compositor: *When an event is received,
>>>> check whether it is on any layer and then walk the layer hierarchy checking
>>>> the protected regions on every layer.  If there is a hit, continue the
>>>> heuristic.  Otherwise, exit this heuristic and event processing proceeds as
>>>> normal.
>>>>
>>>> 4.       *Cursor sanity check:* By querying computed-style with the
>>>> ":hover" pseudo-class on the element (if the target is plugin content) or
>>>> on the host frame element and its ancestors (if the target is a nested
>>>> document), check whether the cursor has been hidden or changed to a
>>>> possibly attacker-provided bitmap: if it has, proceed to *Violation
>>>> management*. This provides protection against "Phantom cursor"
>>>> attacks, also known as "Cursorjacking".
>>>>
>>>> 5.       *Timing check: [ I need some help here ] *Conceptually, we
>>>> would like to track whenever a protected region must be *redrawn
>>>> (--show-property-changed-rects,I think, is the Blink concept) *AND
>>>> when the cause of that redraw originated from a different document
>>>> context.  We want to trigger the heuristic if an enclosing frame overlapped
>>>> the protected region within the specified time interval, but we don't want
>>>> to trigger if redraws originate from within the same document context as
>>>> the protected area. (e.g. if the button itself has a mouseover animation)
>>>> I'm really not sure how this part works or how to describe it in generic
>>>> terms here.  Can we propagate the source and timing of redraw triggers to
>>>> the protected hit test rects or our collection of DOM nodes?
>>>>
>>>> 6.       *Obstruction check: *Compare two sets of pixels: the *control
>>>> image *is the protected area as if it was rendered alone, unobstructed
>>>> by pixels originating from any other document context.  (If step 2
>>>> optimizations were performed, this should be readily available in its own
>>>> composited layer.)  The *user image *represents the same area as the *control
>>>> image* in the outermost document's coordinate system and contains the
>>>> final set of common pixels for the fully rendered page. These images are
>>>> compared, and if the differences are below the *tolerance* threshold
>>>> associated with the input-protection directive, proceed to deliver the
>>>> event normally, otherwise proceed to *Violation management*. If
>>>> portions of the *control image *are clipped by the root view port in
>>>> the outermost document's coordinate system, all such pixels must be
>>>> considered not to match.  [*I don't know enough to say whether the
>>>> comparison can be done on the GPU without marshaling the pixels of the
>>>> control image backing store back to software, or if this is even worth
>>>> mentioning here…]*
>>>>
>>>>
>>>>
>>>> Notes: applying protections to the entire document (if it itself would
>>>> consist of multiple composited layers) or using the input-protection-clip
>>>> property may make many of these optimizations impossible, and may imposes
>>>> performance penalties on the page, perhaps forcing it to fallback to
>>>> all-software rendering.  We should have text warning authors of this, or
>>>> should we simply remove those options from the spec and require
>>>> input-protection-selectors only, to better match the internal strategies of
>>>> modern browser rendering?
>>>>
>>>>
>>>>
>>>
>>>
>>
>
Received on Tuesday, 15 October 2013 22:12:23 UTC