Re: [mediacapture-screen-share] Conditional Focus (When Display-Capture Starts) (#190) from Elad Alon via GitHub on 2021-09-27 (public-webrtc-logs@w3.org from September 2021)

From: Elad Alon via GitHub <sysbot+gh@w3.org>
Date: Mon, 27 Sep 2021 10:58:34 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-927751990-1632740312-sysbot+gh@w3.org>
### Global or per-surface controls

> (@youennf:) I also prefer an attribute over an explicit call for each track.

These are two distinct preferences:
1. Global or per-surface. (Discussed in this section.)
2. Attribute or method. (Discussed in the next section.)

> (@jan-ivar:) There's only one user, who can only accept one prompt at a time.

The browser can operate in modes which skip the prompt. Mechanisms to trigger these include extensions, enterprise policies and command-line arguments.

At any rate, if the application fires off two calls to getDisplayMedia and wants to focus exactly one of these, then it's a lot **more ergonomic** to call focus() on the right track, than to to manipulate a global attribute at just the right time, ensuring it's the intended value when the UA reads it for the one display-surface and the other value when the UA reads it for another display-surface. It requires of the Web-developer much more in-depth understanding.

### Method vs. Attribute

Assume, for the sake of argument, that my previous section convinced you to use per-surface controls. Do we want a method or attribute then?

An application that can read the value may just as well set its own preferred value. An attribute for `focus` would make sense if global, but not if per-surface.

**However**, before even calling `getDisplayMedia`, the application might already wish to know whether it can influence the decision. I would not object to adding a global attribute that reads (and potentially writes?) the default behavior - that which applies if the per-surface API is not invoked. The presence of this global attribute also informs the application that the per-surface control will be exposed if the user chooses to share a focusable surface.

### Subclassing MediaStreamTrack

I think we have seen multiple cases where subclassing MediaStreamTrack would have confered benefits, but each time a discussion arose over whether it's enough to sub-class just for that. The results of having everything on MediaStreamTrack is sub-optimal. Some immediate beneficiary APIs of a decision to sub-class would be:
* The focus API - only focusable surfaces (browser-surfaces and windows; in the future potentially applications and/or isolated-browser-surfaces).
* Capture Handle - only browser-surfaces (maybe more in the future).
* Cropping - only self-capture (SelfCaptureMST -> BrowserSurfaceMST -> FocusableMST -> DisplayCaptureMST -> MST).

IMHO, this list is sufficiently long and the benefits are sufficient. When someone calls `getViewportMedia()`, the result is inherently different than when someone calls `getUserMedia()`, and it makes sense for the APIs exposed to reflect that.

### Tasks vs. Microtasks

IIANM, the only argument for tasks is that they are shim-friendly. (Please correct me if I'm wrong.)

An argument against tasks is that in addition to shimming, it allows an application to `await somePromise`. This is an **anti-pattern**, as the results would be flaky. If `somePromise` is already resolved, it would work; if it will only by resolved by a later task, it would not work.

This trade-off is easy to reason about (IMHO) because **we can have both**. If we use microtasks, shimming is possible with an adapter:
```js
const nativeGDM =  window.navigator.mediaDevices.getDisplayMedia;

function focusCallback(stream) {
  // Return "no-focus-change" or "focus-captured-surface"
}

window.navigator.mediaDevices.getDisplayMedia = async function getDisplayMedia(constraints, focusCallback) {
  const stream = await nativeGDM.apply(this, arguments);
  const [track] = stream.getVideoTracks();
  if (!!focusCallback && !!track.focus) {
    const shouldFocus = focusCallback(stream);
    track.focus(shouldFocus);
  }
  return stream;
}
```

The code outside the shim just plugs their callback. Note that there are natural limits on what the app can do anyway until the window-of-opportunity closes, so I expect the code would easily and naturally fit inside of a synchronous callback.


-- 
GitHub Notification of comment by eladalon1983
Please view or discuss this issue at https://github.com/w3c/mediacapture-screen-share/issues/190#issuecomment-927751990 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 27 September 2021 10:58:36 UTC