Re: [mediacapture-screen-share-extensions] Tab capture control (#13) from Elad Alon via GitHub on 2024-10-18 (public-webrtc-logs@w3.org from October 2024)

From: Elad Alon via GitHub <sysbot+gh@w3.org>
Date: Fri, 18 Oct 2024 15:17:54 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-2422712549-1729264672-sysbot+gh@w3.org>
Thank you for this in-depth feedback!

> - web pages want to opt-in to a mode where the captured preview could be interacted sort of like an embedded iframe but there are security principles that should restrict the level of interaction that is possible.

Just a quick mention that this interpretation *might* be correct for scroll-forwarding, but is incorrect for zoom-controls. More on this below.

> - One principle could be that UA is reponsible to forward some of the user gestures from the preview to the captured web page. The decision heuristics may be UA specific.

I am supportive of empowering user agents to employ heuristics if they wish; if you want explicit text in the spec, I can add it. But we should retain an explicit way for applications to request the behaviors introduced in this specification. I cannot imagine a perfect heuristic; they are always liable to occasionally (a) fire when undesired and (b) fail to fire when desired.

> - The current design seems to introduce one API per gesture. It is not clear to me that this level of flexibility is beneficial, this is worth discussing.

If you want to reshape the API as follows, for future-proofing...

```webidl
dictionary ForwardedGestures {
  optional boolean wheel = false;
  // Future-proof for pinch etc.
};

partial interface CaptureController {
  Promise<undefined> forwardGestures(
      HTMLVideoElement element,
      ForwardedGestures gestures);
}
```

...then I am **supportive**. Wdys?

(Spoiler alert - this shape also solves other issues you have pointed out, and which I address further below.)

> - I could see pinching (missing for google map)

Currently, not requested by Web developers, so not a priority for me. But if you want to add that, I won't oppose. Please see the comment I have left in the WebIDL above.

> - and keyboard zoom commands (missing for google slides)

I am not familiar with "keyboard zoom commands". I am, however, familiar with previous/next and page-down/page-up. These could be discussed as possible later extension; at the moment, I am hesitant about the security properties here. Let's start small.

> - maybe keyboard arrows as well

These make me very nervous. Let's not have this in the MVP.

> - Clicks and other events might be useful in the long run but are definitely risky security wise.

I am absolutely opposed to forwarding clicks. Such actions require a completely different model - one which I have presented in the past ("Video Portal"). But it is not in scope for us right now, and it's not mutually-exclusive with the current model. (That is, even if the Video Portal model were alive right now, it'd only serve some applications, and I'd argue for the introduction of Captured Surface Control APIs for other types of applications.)

> - Maybe some opt-in from the captured application could solve that issue

Any model that requires opt-in from the captured app, does not solve the problem, because the majority of web pages will not opt-in, and therefore the user will not be properly served.

> - It would be nice for capturer to state its interest to forward user gestures, UA can do this even without capturer asking for it.
> - If so, there should be a way for capturer to state that it is no longer interested to forward user gestures.

If you want the spec to also explicitly say that user agents MAY offer gesture-forwarding even when the application does not opt-in, I am happy to add it. Is this the case?

> - The feature is currently gated by a specific prompt/permission policy. The policy seems too much, and we should investigate whether a prompt/dedicated permission is what we want or whether UA heuristics could be good enough. This might influence API shape.

Chrome Security wanted a permission policy, and I agree with their reasoning. However, user agents that wish to impute permission without a prompt, may do so while remaining trivially compliant with the spec. If you'd like the Captured Surface Control spec to state as much explicitly, then I'd gladly add that. Should I?

> - There is most probably an intent that these events do not allow a web page to know that it is being captured. We should discuss this. This probably means there is something missing when capturing a tab. For instance a captured tab visibility should probably be visible even when it is hidden. I do not see that in the screen share spec, this might be worth filing an issue on the screen share spec.

Sorry, no, there is no such intent at the moment.
* If the capturing application believes it important to remain undetected, it can simply avoid these new APIs.
* If Web developers request the ability to forward gestures undetected, we can discuss it then. At the moment, there is no such request.
* If user agents like Safari wish to implement UA-driven forwarding, and these user agents wish to conceal this forwarding, then they can do so while remaining spec-compliant. (And if not I - will support spec-changes that would empower you to do so.)

> - I would leave setZoomLevel on the side for now until we understand what it solves that forwarding user gesture approach cannot.

Web developers have asked for the ability to control zoom, and I think anyone who ever shared a tab during a video call, can immediately understand what this solves.

I am not aware of any gesture that supports this behavior. As I have [shown here](https://github.com/screen-share/captured-surface-control/issues/24), pinch-controls are NOT an alternative.

Also, even if pinching were identical to zoom - which it is not - we'd still need to serve users without touchscreens, and applications that want to show zoom in/out buttons.

Further yet, the **read-access** provided by `getZoomLevel()` is useful for applications that want to show the current zoom level to the user - and that **really** helps the Web application communicate to the user what the zoom in/out buttons do. (See mock in the explainer, and implementation by Meet.)

Zoom-control is totally in-scope for me, and irreplaceable by pinch-controls.

> - captureWheel(HTMLMediaElement e) is probably not right. captureWheel(HTMLMediaElement? e) might be better to allow unsetting (seems like a MUST have). And probably captureWheel(HTMLVideoElement? v) might be even better from a type perspective.

Although I disagree, and although I'd prefer to retain `HTMLElement` as the type here, I want us to reach a mutually-satisfactory compromise. Let's assume for now that I have changed `forwardGestures()` to deal with `HTMLVideoElements`; I will think about it for a bit and then make the change if it still seems fine.

> - The restriction to one video element is a bit artificial. I am not sure why we cannot allow a web page to show the same captured media in two elements (one cropped and the other one uncropped for instance) and allow both of them to forward user gestures.

Fine by me, although I am not aware of any Web developers requesting this. The API shape I have proposed earlier in this comment allows this neatly, because developers can do the following:
```js
  // Start for e1 and e2.
  controller.forwardGestures(e1, {wheel: true});
  controller.forwardGestures(e2, {wheel: true});

  // Stop e1 only.
  controller.forwardGestures(e1, {wheel: false});
```

(You will note that I have made `forwardGestures()` accept `HTMLVideoElement` rather than `HTMLVideoElement?`. This is the reason.)

> Based on this, I would look at an API along those lines:
> ... enableGestureForwarding ...

Thank you for this concrete suggestion. Humbly, I disagree with it. :-) I see a few issues, but I think it's enough to name just one - it does not allow for a permission policy, for those user agents that wish to include one.

---

Thanks again for this in-depth feedback. In summary, I believe that:
1. The change to `forwardGestures()` addresses all issues raised and provides the desired proprties. (Future-proof for additional gestures, supports multiple target-elements, etc.)
2. The spec allows different security and usability philosophies by different browsers in an interoperable manner. (Permission policy do not mandate a prompt, we make an allowance for UA-driven heuristics, etc.)
3. It has been demonstrated that zoom-control is distinct from gesture-forwarding, and especially, that it cannot be sufficiently emulated using pinch-forwarding.
4. Both read-access and write-access over the zoom-level of the captured surface is necessary.

What do you say?

-- 
GitHub Notification of comment by eladalon1983
Please view or discuss this issue at https://github.com/w3c/mediacapture-screen-share-extensions/issues/13#issuecomment-2422712549 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Friday, 18 October 2024 15:17:55 UTC