Re: A Privacy Review of Screen Capture, Working Draft, W3C, 2019-11-19.

Thank you very much Wendell for this thoughtful and comprehensive privacy review. PING participants we will be discussing this on our next call (16 January 2020 UTC 17), but if you have any comments to add or discuss, please feel free to share them on the list.


On 6Jan2020, at 12:23 PM, Wendell Baker <<>> wrote:

Wendell Baker, Verizon Media,<>

Comments are welcome.

A privacy review of

  Screen Capture, Working Draft, W3C, 2019-11-19.

    Martin Thomson (Mozilla)
    Keith Griffin (Cisco)
    Suhas Nandakumar (Cisco)
    Henrik Boström (Google)
    Jan-Ivar Bruaroey (Mozilla)

The specification is an addition to the specification for Media Capture

  Media Capture and Streams, Candidate Recommendation, W3C, 2019-07-02.

    Daniel Burnett
    Adam Bergkvist
    Cullen Jennings
    Anant Narayanan
    Bernard Aboba
    Jan-Ivar Bruaroey
    Henrik Boström

The Screen Capture specification deals with extending the getUserMedia capability to handle the self-same screen as the User Agent (browser) as a video and/or audio track.

The specification addresses many of of the privacy concerns outlined in
Self-Review Questionaire: Security and Privacy, W3C Working Group Note, 2019-09-10.

The review here takes the form of highlighting the relevant sections of Screen Capture with respect to the privacy considerations. In summary, the Screen Capture specification does well when it addresses certain aspects of the privacy, for example the consideration of fingerprintability of APIs and available entropy.  However, there are other aspects of the privacy considerations which are acknowledged but left up to implementors to develop mitigations and controls.  The notion of ensuring that the computer operator always is aware of and in control of what system or application artifacts in video, underspecified and is insufficiently addressed.  Specifically, the areas of audio and screen sharing will require more design and human factors contributions to ensure that the system is both operable and also not inadvertently uncontrollable.

The posture in this review is towards finding a way to make the feature operable and "safe" in the sense that the computer operator (a natural person) is always in control of the machine and always able to know "what it is doing now."  Where that is not the case then mitagations must be put in place to return to the safe and operable design.

Part I, Analysis of the Considerations, by Section

The first part of this review is an analysis organized by the section structure in the in Screen Capture specification.

Section 1. Introduction

<quote>This feature has significant security implications. Applications that use this API to access information that is displayed to users could access confidential information from other origins if that information is under the control of the application. This includes content that would otherwise be inaccessible due to the protections offered by the user agent sandbox.</quote>

The second statement here should have more explanation.  Where the User Agent (browser) is able to escape the sandbox metaphorical boundary, it becomes like any other application and therefore dangerous.  Much ceremony and visibility should be placed around the entry and exit to this state.  The specification could spend more time standardizing that signalling.  While certain signalling is acknowledged, the viewpoint here is that the specification is insufficient in the sense that reasonable implementations might do it differently or elide it in the name of convenience or administrative mandate.

Section 5 Capturing Display Media

<snip/>except that it acquires media from one display device chosen by the end-user each time.

This is a prudent practice.

The terminology changes from (Section 4) "display surface" to (Section 5) "display device"

Section 5.1 MediaDevices Additions

1. Necessarily prompts the user.
2. As stated, the User Agent is able to acquire the content of any display surface at all, not merely the display surface of the User Agent.
3. A combination of video and audio sources are possible
4. User grant choices are not persisted [across what]?

Item 1. Being prompted every time with no persistence seems a prudent UX practice.
Item 2. Ensuring that screen capture is always demonstrable is the key issue here.  It is not clear enough how will be signalled.  For example, some systems use a red "RECORDING" bug in the top or bottom of the screen, or change the framing of the shared window.
Item 3. It is unclear why audio-only sharing is disallowed.  Many of use use audio-only sharing in meetings to preserve our visual privacy, not least to avoid the "nostril cam" effect.
Item 4. It is not clear enough how the short-lived the permission grants evolve.  It is it per session, per User Agent program lifetime?  Many desktop or handheld devices have very long-lived duty cycles (they are never turned off, there is no logout), so a notional "user session" can substantially be the power duty cycle of the device itself (i.e. forever).

The algorithm of Section 5.1 is MUST and feels like it is sufficient.
I did not examine the algorithm for correctness or corner cases.

Section 5.5 Device Identifiers

Peripheral devices are not to be enumerable with the API.
As such the display capture sources cannot be selected by deviceID.
The renumbering and renaming of devices feels like it will be a UX challenge.
It feels unclear how the operating system vendor will have one naming/numbering nomenclature while the Use Agent (browser) vendor will have a different naming/numbering nomenclature.

Section 6 Feature Policy Integration

Defaults to "display-capture" "self"
and Ask the User To Choose.

Section 7. Privacy Indicator Requirements

This section generalizes the underlying specification to account for the display surfaces as sources.
The section specifies that the changes in the display surfaces MUST NOT fire a devicechange event.

Section 8. Security and Permissions

Consideration is given towards allowing user control of audio and video as separate dimensions. For some reason, an "audio only" capture is not allowed while "video only" is allowed.

There is description about how the capture of logical display surfaces outside of the bounding box of the User Agent itself can cause inadvertent and unmanaged privacy or security leakages.  These issues are highlighted with the understanding that the computer operator must know how to mitigate them.  It is not clear that any reasonable system operator will understand how to ensure that inappropriate words and images do not leak into the shared stream.  More could be supplied towards mitigations of the surprise effects here.

A common case of inadvertent sharing of the work screen occurs frequently in the enterprise setting is when someone is sharing a screen in a public setting only to have a personal notification about inbound email pop up on screen "Honey can you pick up half-and-half on the way home?"  The best practice in these things is to shut everything down prior to sharing.  It is not clear that this sort of control is possible here because the User Agent is doing the sharing, performing the multi-window / multi-screen behaviors AND also modulating the event notifications.

In summary, even with the specification of Privacy Indicator Requirements [GETUSERMEDIA], it is not clear that a reasonable computer operator would be able to control the User Agent to keep the relevant parts of the sharable surface out of view.

Part II. Responses to the Security & Privacy Questionaire

The second part of this review develops answers to the Security & Privacy Questionaire directly.

Following the Questions of the Security & Privacy Questionaire

Question 2.1 What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
Answer: the feature exposes streams of live video and (optionally) audio of a user's device towards the receiver.

Question 2.2 Is this specification exposing the minimum amount of information necessary to power the feature?
Answer: Steps are taken to minimize the variability of APIs and of answers from system-configurable settings.
Steps are taken to isolate the API down to the minimum amount of information needed to operate the API.

Question 2.3 How does this specification deal with personal information or personally-identifiable information or information derived thereof?
Answer: The feature does not work with personal information directly.
Insread, the feature makes it trivially easy to inadvertently and constantly share personal information without control or knowledge of the computer operator (a natural person in the operating span of the device)

Question 2.4 How does this specification deal with sensitive information?
Answer: The feature does not work with sensitive information directly.

Question 2.5 Does this specification introduce new state for an origin that persists across browsing sessions?
Answer: no

Question 2.6 What information from the underlying platform, e.g. configuration data, is exposed by this specification to an origin?
Answer: after consent, the capabilities of the video and audio, as chosen by the computer operator

Question 2.7 Does this specification allow an origin access to sensors on a user’s device
Answer: no

Question 2.8 What data does this specification expose to an origin?
Please also document what data is identical to data exposed by other features, in the same or different contexts.
Answer: the video & audio capabilities, as chosen by the computer operator

Question 2.9 Does this specification enable new script execution/loading mechanisms?
Answer: no

Question 2.10 Does this specification allow an origin to access other devices?
Answer: no

Question 2.11 Does this specification allow an origin some measure of control over a user agent’s native UI?
Answer: yes

Question 2.12 What temporary identifiers might this this specification create or expose to the web?
Answer: the "a stable and private id" for the media devices

Question 2.13 How does this specification distinguish between behavior in first-party and third-party contexts?
Answer: no (not applicable)

Question 2.14 How does this specification work in the context of a user agent’s Private Browsing or "incognito" mode?
Answer: not stated, one must assume the feature is insensitive to Private/Incognito mode

Question 2.15 Does this specification have a "Security Considerations" and "Privacy Considerations" section?
Answer: yes

Question 2.16 Does this specification allow downgrading default security characteristics?
Answer: no

Question 2.17 What should this questionnaire have asked?
Answer: How shall the computer operator know what the User Agent is doing without having been the person to have granted the consents in the first instance.  For example, consider a kiosk in a meeting room or a tablet in a communal social setting. How does the computer operator discover what it is sharing after the fact?

Answer: one concern with sharing is that in some jurisdictions sharing and storing of sharing (recording) requires more consent from more parties. For example multi-party consient is required for recording audio and video in certain jurisdictions.  Inadvertent operation of this API could run afoul of the laws governing these sorts of features.  This puts the computer operator at legal risk for untrained or inadvertent operation of the User Agent.


Received on Wednesday, 8 January 2020 18:19:03 UTC