A Privacy Review of Screen Capture, Working Draft, W3C, 2019-11-19. from Wendell Baker on 2020-01-06 (public-privacy@w3.org from January to March 2020)

From: Wendell Baker <wbaker@verizonmedia.com>
Date: Mon, 6 Jan 2020 12:23:09 -0800
To: public-privacy@w3.org
Message-ID: <CAHKZHwgJXZEVnB9pcCfdrLbcs29ONvQS84=MmHf5fKX5rym4eA@mail.gmail.com>
Reviewer:
Wendell Baker, Verizon Media, wbaker@verizonmedia.com

Comments are welcome.

A privacy review of

  Screen Capture, Working Draft, W3C, 2019-11-19.
  https://www.w3.org/TR/2019/WD-screen-capture-20191119/
    Martin Thomson (Mozilla)
    Keith Griffin (Cisco)
    Suhas Nandakumar (Cisco)
    Henrik Boström (Google)
    Jan-Ivar Bruaroey (Mozilla)

The specification is an addition to the specification for Media Capture

  Media Capture and Streams, Candidate Recommendation, W3C, 2019-07-02.
  https://www.w3.org/TR/mediacapture-streams/
    Daniel Burnett
    Adam Bergkvist
    Cullen Jennings
    Anant Narayanan
    Bernard Aboba
    Jan-Ivar Bruaroey
    Henrik Boström
  Related:
    https://github.com/w3c/mediacapture-screen-share/

The Screen Capture specification deals with extending the getUserMedia
capability to handle the self-same screen as the User Agent (browser) as a
video and/or audio track.

The specification addresses many of of the privacy concerns outlined in
Self-Review Questionaire: Security and Privacy, W3C Working Group Note,
2019-09-10.

The review here takes the form of highlighting the relevant sections of
Screen Capture with respect to the privacy considerations. In summary, the
Screen Capture specification does well when it addresses certain aspects of
the privacy, for example the consideration of fingerprintability of APIs
and available entropy.  However, there are other aspects of the privacy
considerations which are acknowledged but left up to implementors to
develop mitigations and controls.  The notion of ensuring that the computer
operator always is aware of and in control of what system or application
artifacts in video, underspecified and is insufficiently addressed.
Specifically, the areas of audio and screen sharing will require more
design and human factors contributions to ensure that the system is both
operable and also not inadvertently uncontrollable.

The posture in this review is towards finding a way to make the feature
operable and "safe" in the sense that the computer operator (a natural
person) is always in control of the machine and always able to know "what
it is doing now."  Where that is not the case then mitagations must be put
in place to return to the safe and operable design.

Part I, Analysis of the Considerations, by Section

The first part of this review is an analysis organized by the section
structure in the in Screen Capture specification.

Section 1. Introduction

<quote>This feature has significant security implications. Applications
that use this API to access information that is displayed to users could
access confidential information from other origins if that information is
under the control of the application. This includes content that would
otherwise be inaccessible due to the protections offered by the user agent
sandbox.</quote>

The second statement here should have more explanation.  Where the User
Agent (browser) is able to escape the sandbox metaphorical boundary, it
becomes like any other application and therefore dangerous.  Much ceremony
and visibility should be placed around the entry and exit to this state.
The specification could spend more time standardizing that signalling.
While certain signalling is acknowledged, the viewpoint here is that the
specification is insufficient in the sense that reasonable implementations
might do it differently or elide it in the name of convenience or
administrative mandate.

Section 5 Capturing Display Media

<quote
<snip/>except that it acquires media from one display device chosen by the
end-user each time.
</quote>

This is a prudent practice.

The terminology changes from (Section 4) "display surface" to (Section 5)
"display device"

Section 5.1 MediaDevices Additions

Summarizing...
1. Necessarily prompts the user.
2. As stated, the User Agent is able to acquire the content of any display
surface at all, not merely the display surface of the User Agent.
3. A combination of video and audio sources are possible
4. User grant choices are not persisted [across what]?

Item 1. Being prompted every time with no persistence seems a prudent UX
practice.
Item 2. Ensuring that screen capture is always demonstrable is the key
issue here.  It is not clear enough how will be signalled.  For example,
some systems use a red "RECORDING" bug in the top or bottom of the screen,
or change the framing of the shared window.
Item 3. It is unclear why audio-only sharing is disallowed.  Many of use
use audio-only sharing in meetings to preserve our visual privacy, not
least to avoid the "nostril cam" effect.
Item 4. It is not clear enough how the short-lived the permission grants
evolve.  It is it per session, per User Agent program lifetime?  Many
desktop or handheld devices have very long-lived duty cycles (they are
never turned off, there is no logout), so a notional "user session" can
substantially be the power duty cycle of the device itself (i.e. forever).

The algorithm of Section 5.1 is MUST and feels like it is sufficient.
I did not examine the algorithm for correctness or corner cases.

Section 5.5 Device Identifiers

Peripheral devices are not to be enumerable with the API.
As such the display capture sources cannot be selected by deviceID.
The renumbering and renaming of devices feels like it will be a UX
challenge.
It feels unclear how the operating system vendor will have one
naming/numbering nomenclature while the Use Agent (browser) vendor will
have a different naming/numbering nomenclature.

Section 6 Feature Policy Integration

Defaults to "display-capture" "self"
and Ask the User To Choose.

Section 7. Privacy Indicator Requirements

This section generalizes the underlying specification to account for the
display surfaces as sources.
The section specifies that the changes in the display surfaces MUST NOT
fire a devicechange event.

Section 8. Security and Permissions

Consideration is given towards allowing user control of audio and video as
separate dimensions. For some reason, an "audio only" capture is not
allowed while "video only" is allowed.

There is description about how the capture of logical display surfaces
outside of the bounding box of the User Agent itself can cause inadvertent
and unmanaged privacy or security leakages.  These issues are highlighted
with the understanding that the computer operator must know how to mitigate
them.  It is not clear that any reasonable system operator will understand
how to ensure that inappropriate words and images do not leak into the
shared stream.  More could be supplied towards mitigations of the surprise
effects here.

A common case of inadvertent sharing of the work screen occurs frequently
in the enterprise setting is when someone is sharing a screen in a public
setting only to have a personal notification about inbound email pop up on
screen "Honey can you pick up half-and-half on the way home?"  The best
practice in these things is to shut everything down prior to sharing.  It
is not clear that this sort of control is possible here because the User
Agent is doing the sharing, performing the multi-window / multi-screen
behaviors AND also modulating the event notifications.

In summary, even with the specification of Privacy Indicator Requirements
[GETUSERMEDIA], it is not clear that a reasonable computer operator would
be able to control the User Agent to keep the relevant parts of the
sharable surface out of view.

Part II. Responses to the Security & Privacy Questionaire

The second part of this review develops answers to the Security & Privacy
Questionaire directly.

Following the Questions of the Security & Privacy Questionaire
https://www.w3.org/TR/security-privacy-questionnaire/

Question 2.1 What information might this feature expose to Web sites or
other parties, and for what purposes is that exposure necessary?
Answer: the feature exposes streams of live video and (optionally) audio of
a user's device towards the receiver.

Question 2.2 Is this specification exposing the minimum amount of
information necessary to power the feature?
Answer: Steps are taken to minimize the variability of APIs and of answers
from system-configurable settings.
Steps are taken to isolate the API down to the minimum amount of
information needed to operate the API.

Question 2.3 How does this specification deal with personal information or
personally-identifiable information or information derived thereof?
Answer: The feature does not work with personal information directly.
Insread, the feature makes it trivially easy to inadvertently and
constantly share personal information without control or knowledge of the
computer operator (a natural person in the operating span of the device)

Question 2.4 How does this specification deal with sensitive information?
Answer: The feature does not work with sensitive information directly.

Question 2.5 Does this specification introduce new state for an origin that
persists across browsing sessions?
Answer: no

Question 2.6 What information from the underlying platform, e.g.
configuration data, is exposed by this specification to an origin?
Answer: after consent, the capabilities of the video and audio, as chosen
by the computer operator

Question 2.7 Does this specification allow an origin access to sensors on a
user’s device
Answer: no

Question 2.8 What data does this specification expose to an origin?
Please also document what data is identical to data exposed by other
features, in the same or different contexts.
Answer: the video & audio capabilities, as chosen by the computer operator

Question 2.9 Does this specification enable new script execution/loading
mechanisms?
Answer: no

Question 2.10 Does this specification allow an origin to access other
devices?
Answer: no

Question 2.11 Does this specification allow an origin some measure of
control over a user agent’s native UI?
Answer: yes

Question 2.12 What temporary identifiers might this this specification
create or expose to the web?
Answer: the "a stable and private id" for the media devices

Question 2.13 How does this specification distinguish between behavior in
first-party and third-party contexts?
Answer: no (not applicable)

Question 2.14 How does this specification work in the context of a user
agent’s Private Browsing or "incognito" mode?
Answer: not stated, one must assume the feature is insensitive to
Private/Incognito mode

Question 2.15 Does this specification have a "Security Considerations" and
"Privacy Considerations" section?
Answer: yes

Question 2.16 Does this specification allow downgrading default security
characteristics?
Answer: no

Question 2.17 What should this questionnaire have asked?
Answer: How shall the computer operator know what the User Agent is doing
without having been the person to have granted the consents in the first
instance.  For example, consider a kiosk in a meeting room or a tablet in a
communal social setting. How does the computer operator discover what it is
sharing after the fact?

Answer: one concern with sharing is that in some jurisdictions sharing and
storing of sharing (recording) requires more consent from more parties. For
example multi-party consient is required for recording audio and video in
certain jurisdictions.  Inadvertent operation of this API could run afoul
of the laws governing these sorts of features.  This puts the computer
operator at legal risk for untrained or inadvertent operation of the User
Agent.

END
Received on Monday, 6 January 2020 20:23:26 UTC