Re: [mediacapture-surface-control] Is gesture forwarding tied to capture controller or to MediaStreamTrack or to DOM objects? (#45) from Jan-Ivar Bruaroey via GitHub on 2025-03-17 (public-webrtc-logs@w3.org from March 2025)

From: Jan-Ivar Bruaroey via GitHub <sysbot+gh@w3.org>
Date: Mon, 17 Mar 2025 21:46:18 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-2731008017-1742247976-sysbot+gh@w3.org>

> After the discussion in the WG meeting, I see that this is Google's position:
> 
>  1. The use case concerns 3 entities:
>     
>     * An element that events are forwarded from (usually called overlay element in the discussions)
>     * A captured tab where events are forwarded to
>     * Some element used for rendering the capture (typically a video element or canvas). In the very simplest cases this element can be the same as the overlay, but in general it is not (core use cases like annotations generally require them to be separate).

I think we need to approach this from first principles. Once we do, it seems obvious that calling the _"element that events are forwarded from"_ the _"overlay"_ is already baking in implementation assumptions. It's a limited way of looking at things that
- presumes UAs can only capture input from one element and its children, which in turn
- puts demands on website layout and placement of this "overlay" to fully encompass the rendering, to maintain the illusion the user is interacting with rendering from another element, when this is unnecessary and breaks down at the edges
- presumes websites cannot implement annotations any other way
- locks us in from considering other solutions to this problem

This is troubling, since no use case has been presented that requires detaching rendering from input forwarding in this way, a significant security concern from my point of view.

From my position (which I claim is from first principles), the 3 entities that comprise the use case are slightly different:

1. The use case concerns 3 classes of entities:

     * One or more (typically video or canvas) elements that render the capture in a manner the user can interact with
     *  A captured tab where input is forwarded to
     * One or more elements overlapping the capture-rendering elements to implement annotations or scrollable stickies

As a next step to unblock this, maybe the WG should vote on which explanation describes the use case best?

Last month, from these first principles, I [presented](https://docs.google.com/presentation/d/1XHPL-hiWXlra2bWDJHHkZmSGrx3Y97xGcEv43mnwIpU/edit#slide=id.g3349a480247_1_398) what seems to me the most logical API. 

To make progress, I'd love for folks to engage with my proposal, and show me where I'm wrong. But it seems a lot of the benefits you say you're willing to compromise on, fall out naturally from my proposed API, whereas from your API they don't.

-- 
GitHub Notification of comment by jan-ivar
Please view or discuss this issue at https://github.com/w3c/mediacapture-surface-control/issues/45#issuecomment-2731008017 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 17 March 2025 21:46:19 UTC