[mediacapture-screen-share-extensions] Multi-capture (concurrent capture of multiple surfaces) (#8) from Elad Alon via GitHub on 2024-02-23 (public-webrtc@w3.org from February 2024)

From: Elad Alon via GitHub <sysbot+gh@w3.org>
Date: Fri, 23 Feb 2024 08:51:04 +0000
To: public-webrtc@w3.org
Message-ID: <issues.opened-2150641879-1642100997-sysbot+gh@w3.org>

eladalon1983 has just created a new issue for https://github.com/w3c/mediacapture-screen-share-extensions:

== Multi-capture (concurrent capture of multiple surfaces) ==
It has come to my attention that some applications wish to capture multiple display surfaces at the same time. Some examples include:
* Streamers presenting multiple surfaces. [*]
* Managed devices recording for compliance/training/billing reasons.

Capturing multiple display surfaces is presently achievable using existing APIs - it is possible to call `getDisplayMedia()` multiple times. However, this is not very ergonomic, and creates **serious friction for the user**:
1. The user has to interact with the browser's media-picker multiple times.
2. The user has to interact with the application multiple times, signaling that they want to capture yet another surface, and providing a new transient activation each time.
3. The user is liable to make mistakes when trying to remember which surfaces they've already started capturing, and which surfaces remain for them to capture.

Ideally, a single transient activation could be used for single API invocation, providing the user with a media-picker with functionality **akin** to checkboxes (mentioned here by way of example; we don't need to mandate specific UX elements). The user would be allowed to choose all of the display surfaces that they want to capture, then click OK once. It is clear from context that these are all of the surfaces the user was aiming to capture, and that no additional API calls to gDM or the like are necessary.

As a straw-man proposal, imagine `getDisplayMedia({video: true, ..., maxSurfaces: N})`. The default value of `maxSurfaces` is 1, and would trigger the current behavior, returning a single `MediaStream`. A higher value would trigger the new behavior, and return an array, `[MediaStream]`.

![mock](https://user-images.githubusercontent.com/22117736/154129999-c19e2d07-5fb2-49de-a3f4-af4c8bd87511.jpg)

Finer points off the bat:
* The UA may impose a limit on how many streams may be captured concurrently and prevent the user from choosing more.
* If a `maxSurfaces` greater than 1 is specified, an array will be returned even if the user chooses one surface, to simplify things for the application.

Interesting points to discuss:
* MUST/SHOULD/MAY limit the user to choose only one **type** of display-surface? (Without influencing which.) That is to say, maybe the user can choose any N tabs, any N windows, or any N monitors, but not a combination of K tabs and N-K screens.

CC @shangl, whose use-case prompted this.

--
[*] Imagine an instructor streaming multiple tabs, and individual viewers independently choosing which one to focus on. I mention this so as to discourage solutions involving stitching together of multiple surfaces on a logical surface.

Please view or discuss this issue at https://github.com/w3c/mediacapture-screen-share-extensions/issues/8 using your GitHub account

--
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 23 February 2024 08:51:05 UTC