Proposed change to Media Capture and Streams

10.5 Obtaining Screen Based Video

The video source does not have to be a camera, it can be some visible portion of the users screen. This is useful for various screen sharing type applications. This section describes an API for an application to indicate that it wishes to capture video from a canvas element, browser tab, application, or the whole desktop. Browsers are not required to implement any of the mechanisms described in this section to be WebRTC compliant. There are significant security concerns with capture of the users screen as discussed in [RTCWEB-SECURITY-ARCH] and [RTCWEB-SECURITY]. Implementations need to ensure they provide adequate security as discussed later in this section.

Capture of screen video is controlled by a new constraint called "mediaSource" which takes values such as "browser", "application", "screen", or "camera". This constraint can be passed to the getUserMedia call which can display a dialog that allows the user to select the correct input as well as gather appropriate permissions from the user. The dialog is different depending on the value of "mediaSource" but this could show a list of thumbnails or names of the video sources that are available and allow users to pick the appropriate one.

The video generated MUST be from the portion of the screen belonging to the item that was authorized and it MUST be visible. If an application is being shared, but the top right corner of the application is covered by a window from some other application, that top right corner needs to be obscured in the video stream. A typical way to obscure it is by replacing the obscured area with a grey rectangle. If an application is shared and that application has multiple windows, the video stream to share is formed by constructing the bounding box around all the windows in that application and sharing a single video stream capturing all the windows in that application. Of course any screen area that does not belong to that application is obscured.

Note

Open Issue: Some systems obscure with the image of what was visible before it was obscured. Imagine a use caser where Alice is sharing a powerpoint presentation with with Bob. Alice gets an instant messages which pops up a dialog box on top of th powerpoint application. In some systems, the video sent to Bob will show a grey rectangle and Bob will know an IM poped up even thought Bob can not see the contents of the IM. In other systems, the obscuring will be done using the previously visible bitmap so, as long as the powerpoint slide does not change before Alice gets rid of the IM dialog box, Bob will not see a big ugly grey rectangle in the middle of the slide he is trying to read. The down side is if the window being shared was a video, the obscured rectangle will look frozen and some users will perceive this as a bug in the system while the grey rectangle users will generally understand was an obscured region. There can also be cases where as windows in the application are moved around, "old" data does not get cleared up as the system is using to generate obscured data.

Note

Open Issue: On some platforms, the requirement to only show what is visible might be difficult to achieve. This could lead to extended periods of obscured video for application sharing as users interact with their browser.

10.5.1 Screen Based Video Constraints

Note

Open Issue: This is described as a constraint but it may get moved to be a setting.

Property Name Values Notes
mediaSource MediaSourceEnum The source of the video from the users screen.
enum MediaSourceEnum {
    "camera",
    "browser",
    "application",
    "screen"
};
Enumeration description
cameraThe source is a camera. This is the default.
browserThe source is a tab in the same browser. The identifier for the tab to be shared is provided in the "window" property. OPEN ISSUE: What to use as a handle to the tab? In the case of windows that are opened by the application, a window reference might suffice to identify the tab (which would enable the sharing of a single iframe). Alternatively, the site could identify an origin that it wants shared, but that could leads to some interesting information leakage if we weren't careful.
applicationThe source is all the windows for some application. No identifier is supplied.
screenThe source is the whole screen of one of the users monitors. No identifier is supplied.

The choice of source elements presented to the user for selection can further be restricted by the additional constraints.

Property Name Values Notes
window (Window or WindowProxy)? The window object, or a proxy for that window that is to be shared. If omitted, the user can be prompted to select a browser tab.

10.5.2 Security and Permissions

There are several security issues that need to be considered. One of the most important is the case of an "evil" web page requesting sharing of the browser then the "evil" page managing to open another web page, such as a banking page, inside that browser. The "evil" web page could then see the information displayed on the banking page. A similar mechanism may be usable for bypassing CSRF protection. These attacks and others are described in more detail in [RTCWEB-SECURITY]. For this reason, this specification only permits sharing of the browser via "browser sharing" and requires a heightened permissions experience for that use.

One approach to securing types of sharing where the webpage could impact the content that is being shared is to require that web page have a persistent permission acquired in some "install" like user experience. This allows the "install" user experience to be a place to explain the risk to the user. This is referred to as an "Application Permission". Some browsers have a concept of an application store and application install to support such permissions.

Alternatively, a browser could permit screen sharing of a given origin, but replace all content that is inaccessible to that origin with grey rectangles. This reduced functionality might be disconcerting for users, but it ensures that CSRF protections are retained.

The approach to sharing where this is not easy for the webpage to impact the content is to, each time getUserMedia is called, show the user a dialog where they choose the content they wish to share. This type is refered to as "User Choice Permission". [[OPEN ISSUE: need a better way to think about and describe this, concentrating perhaps on the fact that rather than a dialog click-through, the user needs to explicitly act to enable types of sharing.]]

Shared Permission
browser Requires Application Permission to use this and if a specific tabId is not provided, there also needs to be a User Choice Permission to select the tab.
application Requires an User Choice Permission to select the application. The application can not be the same browser doing the sharing.
screen Requires Application Permission to use this as well as the User Choice Permission to select the screen to share for multiscreen system and to allow sharing to start for single screen systems. The browser window must be masked in this case. [[OPEN ISSUE: Consistent with above, but is it OK]]

When content is being shared, it is important to consider what user interface can be provided to remind the user which content is being shared. In addition, there MUST be a way for the user to stop sharing. Implementations might be able to handle this in a simular way to how they handle sharing of a camera or microphone, but some forms of sharing could require different forms of user feedback. For instance, many cameras have a small light that indicates that the camera is capturing. Some applications use special framing to provide similar feedback for shared application.