[mediacapture-surface-control] Tab capture control (#49) from youennf via GitHub on 2024-11-13 (public-webrtc@w3.org from November 2024)

From: youennf via GitHub <sysbot+gh@w3.org>
Date: Wed, 13 Nov 2024 08:35:20 +0000
To: public-webrtc@w3.org
Message-ID: <issues.opened-2654670070-1729255326-sysbot+gh@w3.org>

youennf has just created a new issue for https://github.com/w3c/mediacapture-surface-control:

== Tab capture control ==
This issue is related to the https://screen-share.github.io/captured-surface-control/ proposal, based on my reading of the spec and experimenting with https://captured-surface-control.glitch.me/ to control google slides and google map.

First, the use case is fine. The current state of the prototype and spec do not seem sufficient for a consistent user experience though. This somehow casts doubts on the approach (or at least we should understand what needs to happen next to make the user experience good).

The spec/implementation is focusing on specific inputs (zoom and scrolling). Focusing on a more general principle may be beneficial. Some thoughts that I hope can help:
- web pages want to opt-in to a mode where the captured preview could be interacted sort of like an embedded iframe but there are security principles that should restrict the level of interaction that is possible.
- One principle could be that UA is reponsible to forward some of the user gestures from the preview to the captured web page. The decision heuristics may be UA specific.
- The list of forward-able gestures can start small but could grow as progress is made, without having to introduce new APIs. The current design seems to introduce one API per gesture. It is not clear to me that this level of flexibility is beneficial, this is worth discussing.
- The current list is scrolling (and somehow zoom level). I could see pinching (missing for google map) and keyboard zoom commands (missing for google slides), maybe keyboard arrows as well. It seems worth exploring what can be done here before doing API bikeshed.
- Clicks and other events might be useful in the long run but are definitely risky security wise. Maybe some opt-in from the captured application could solve that issue, I am not sure. In any case, this sounds like a captured page API, which makes the current principle of a capturer API to forward gestures somehow future proof.
- It would be nice for capturer to state its interest to forward user gestures, UA can do this even without capturer asking for it.
- If so, there should be a way for capturer to state that it is no longer interested to forward user gestures.
- The feature is currently gated by a specific prompt/permission policy. The policy seems too much, and we should investigate whether a prompt/dedicated permission is what we want or whether UA heuristics could be good enough. This might influence API shape.
- There is most probably an intent that these events do not allow a web page to know that it is being captured. We should discuss this. This probably means there is something missing when capturing a tab. For instance a captured tab visibility should probably be visible even when it is hidden. I do not see that in the screen share spec, this might be worth filing an issue on the screen share spec.

A few additional thoughts:
- It seems interesting to concentrate on designing an API whose scope is to allow web pages express their interest in forwarding user gesture. `captureWheel` API is in scope, `resize` events as well AIUI (I think that is what is used for zoom-in/zoom-out).
- I would leave `setZoomLevel` on the side for now until we understand what it solves that forwarding user gesture approach cannot.
- `captureWheel` name is not great if we plan to extend the user gestures that can be forwarded.
- `captureWheel(HTMLMediaElement e)` is probably not right. `captureWheel(HTMLMediaElement? e)` might be better to allow unsetting (seems like a MUST have). And probably `captureWheel(HTMLVideoElement? v)` might be even better from a type perspective.
- The restriction to one video element is a bit artificial. I am not sure why we cannot allow a web page to show the same captured media in two elements (one cropped and the other one uncropped for instance) and allow both of them to forward user gestures.

Based on this, I would look at an API along those lines:
```
partial interface HTMLVideoElement {
attribute boolean enableGestureForwarding;
};
```
And maybe, a secondary optional API to allow web page to know what is going on:
```
partial interface HTMLVideoElement {
readonly attribute boolean gestureForwarding;
attribute EventHandler ongestureforwardingchange;
};
```
This kind of API shape adds some flexibility in how much UA wants to forward or not user gesture (say user enables forwarding and in the middle of the call disables it) and unties the API from permissions.

Please view or discuss this issue at https://github.com/w3c/mediacapture-surface-control/issues/49 using your GitHub account

--
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 13 November 2024 08:35:21 UTC