Re: Comments/Questions on Media Capture Streams – Privacy and Security Considerations from Harald Alvestrand on 2015-09-21 (public-privacy@w3.org from July to September 2015)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Mon, 21 Sep 2015 13:23:25 +0200
To: Nick Doty <npdoty@w3.org>, public-media-capture@w3.org
Cc: "public-privacy (W3C mailing list)" <public-privacy@w3.org>
Message-ID: <55FFE8AD.6040602@alvestrand.no>
Apologies for it taking so long before making substantive response - I
seem to have become the "designated driver" for this particular discussion.

On 07/14/2015 04:28 AM, Nick Doty wrote:
> Hi Media Capture Task Force,
>
> The Privacy Interest Group has been discussing the Media Capture and
> Streams Last Call from a privacy perspective. Some discussion has
> already taken place on public-privacy and other lists, but we've tried
> to consolidate feedback here. Please include the public-privacy list
> in followups.
>
> Hope this helps,
> Nick Doty, for W3C Privacy Interest Group (PING)
>
>
> ## Comments/Questions on Media Capture Streams – Privacy and Security
> Considerations
>
> Our input is intended to help the Media Capture Task Force produce a
> more privacy-protecting API.
>
> This feedback is based on discussions within the Privacy Interest
> Group and various email threads, consolidated as much as possible on
> similar topics. Also, some comments were provided in October 2014 on
> Media Stream recording, which may also be relevant:
> https://lists.w3.org/Archives/Public/public-privacy/2014OctDec/0004.html
>
> The Media Capture Task Force might be interested in documents
> (in-progress, and themselves needing review and feedback) to aid in
> identifying and mitigating privacy issues in Web specifications, which
> have been used as part of this review, including:
> * https://w3ctag.github.io/security-questionnaire/
> * https://w3c.github.io/fingerprinting-guidance/
>
> Comments below have been combined into categories.
>
> ### Consent/Permissions
>
> Do permissions carry forward across sessions? Could there be a
> built-in sunset period for permissions? European participants in
> particular have raised this concern as it may be related to legal
> compliance in the EU.
>
> It would be nice if there was a simple, user friendly way to revoke
> consent for a stream (especially audio/webcam streams). As it
> currently stands, once consent is granted there doesn't seem to be
> simple way to revoke it.

Permissions are given to an origin on a device; there is no place in the
model for permissions on streams. Adding this would make things more
complicated and not more secure, since an origin could simply create a
new stream if permission for a stream was revoked.

Permissions carry forward only if persisted; the decision to persist is
taken by the UA, not by JS. We explicitly forbid persisting permission
for insecure pages.

There is no API function to revoke permissions. We have advice in
section 13 (security and privacy considerations) saying that "it is
important that it is easy to find the list of granted permissions and
revoke permissions that the user wishes to revoke." We haven't found a
reason to give more specific advice to implementors here.

In implementations, we have also found it reasonable to erase all stored
permissions when clearing cookies for that origin; it may be reasonable
to give advice on this in the document.


>
>> "when the page is secure"
>
> "secure" is a word that often gets defined in different ways. Would it
> be more precise to refer to "privileged contexts"?
> http://www.w3.org/TR/powerful-features/#settings-privileged
>
> Not persisting permissions in such settings is a good base-line
> requirement. Section 10.6 states that persistent permissions must be
> be served over HTTPS and have no mixed content. It would be nice to
> see the definition of mixed content expanded to include the various
> issues mentioned in Bonneau's recent paper[1]. For example, if a site
> elects to use pinning, it should be considered to have mixed content
> if it loads non-pinned content.
>
> [1] http://www.jbonneau.com/doc/KB15-NDSS-hsts_pinning_survey.pdf
>
> [Note: This last point is perhaps also relevant
> to http://www.w3.org/TR/mixed-content/]

We refer to https://www.w3.org/TR/mixed-content/ - we do not want to
redefine the concept in this document, believing that this would only
cause confusion for implementors.
If mixed-content needs updating, then that is the proper place to fix
the issue.

>
> You've heard from the TAG already about whether use of the API ever
> makes sense in unprivileged contexts. That is, when the user is asked
> for permission to access their camera, do they understand that they're
> granting this permission to all network attackers as well as the site
> they think they're talking to? I suspect this PING email thread is not
> going to change your minds about that already discussed topic.
> However, it would be worthwhile to note this security threat in the
> security considerations section and to note for user agent
> implementers the difficulty for this permission prompt.

This has indeed been extensively discussed. The current text is the
compromise position that was reached - documenting the tradeoffs in this
compromise in section 13 makes sense.

Filed https://github.com/w3c/mediacapture-main/issues/249 .

>
> Best Practice 2 is in a section entitled "Implementation Suggestions",
> but contains a normative MUST statement. If this is an
> interoperability requirement and MUST is defined as in 2119, then I
> think "suggestions" (and indeed, "best practice") is probably
> incorrect terminology.

This is actually a restatement from draft-ietf-rtcweb-security-arch-11
section 5.2. It needs to be described as such (quoting another
document's MUST, not establishing a new one).

Filed https://github.com/w3c/mediacapture-main/issues/250 .

>
> Permissions for getUserMedia seem to be specific to entry script
> origin. Is this what users will expect? For example, if I grant and
> persist permission to callmyfriends.com <http://callmyfriends.com> to
> use their service and later I browse to example.com
> <http://example.com> which has an embedded iframe of callmyfriends.com
> <http://callmyfriends.com>, will users be shocked to see their camera
> turn on and a picture of themselves on the screen? Permission breadth
> may be a flexible option for the user agent ("Optionally, e.g., based
> on a previously-established user preference, for security reasons"),
> but it might be useful for the spec to establish some expectations
> here. Top-level origin/embedded origin pairs, for example, might be a
> useful model, as in some implementations of Geolocation.

The converse issue (example.com has permission, and callmyenemy.com pops
up inside an iframe and inherits the permission) has been discussed,
with the suggestion that the iframe sandbox should strip away the
permission.

I don't think this version of the iframe issue has been discussed. I do
think the "embed a call function" is an important enough use case that
disallowing it will surprise implementors.

>
> ### Device enumeration
>
> Why is there no requirement for user permission before a platform
> detects how many devices of each class are connected/available? Does
> the specification provide a mechanism to allow a user agent to deny
> access if an application is not in use?

This was the result of an extensive discussion about the need to limit
fingerprinting surface vs the need to present appropriate UI - for
instance, implementors did not want to present a camera choice button if
only one camera was available, or a video-call button if no camera was
available at all.

The amount of fingerprinting exposed by counting devices seemed small
enough to be acceptable, and we did not see any other related risk.

>
> Can we specify the order in which devices should be listed? If this
> will vary, it will make it that much easier to fingerprint the user
> agent, based not only on what kinds of devices they have attached, but
> what order the software happened to list those devices. (For example,
> see our experience with font listing.)

For which value of "we"?
It seems that unless the user-agent string is banned, this has very low
value.

>
> Does this need to be enumerable? Fingerprintability of plugins, for
> example, can be dramatically reduced by changing it to a query model.
> Does the user have a camera attached? Does the user have a microphone
> attached? If so, the site can then ask the user for permission and
> when they do so, they can get deviceIds, kinds and grouping of
> devices, labels, etc. Related: what purpose does the deviceId serve
> prior to granting of permission (as opposed to just knowing the
> kinds/capabilities)?
Persistence for a site that has previously used the camera, and wants to
present a different dialog when returning (perhaps opening the camera
without comment if it already knows which one to use, asking the user to
choose the "special camera" if it doesn't know which to use).

> Does the site need to know that I have a microphone and a grouped
> microphone/webcam and a separate webcam *before* asking me for
> permission to access my camera? If the enumeration and identifiers are
> only present after asking for permission, then no additional
> permission prompt is needed and leakage of information can be reduced.
>
> Imagine that you were writing a browser that wanted to reduce
> fingerprinting and was willing to limit functionality but didn't want
> to drop functionality altogether. Is there any compliant way for that
> browser to indicate prior to the permission prompt, "yes, video/audio
> are supported" without enumerating the configuration of devices?
No, this is not supported now.

Note that a recent change
(https://github.com/w3c/mediacapture-main/pull/219) removed the ability
to persist IDs without successfully requesting a device.

> It seems like user agents are given some flexibility on how they
> select constraints for the constrainability pattern, can we provide
> similar flexibility as to how they indicate capabilities rather than
> device enumeration?
I don't understand what this question means, so I'll skip answering it.
> As it is now, it appears that user agents that want to block access to
> a list of attached webcam devices have to completely block use of
> WebRTC, even when there's a permission grant; an unfortunate and
> unnecessary loss of functionality for those who are concerned about
> this source of fingerprintability.
It's a tradeoff (as mentioned above). This is where the group chose to
come down; allowing for even more different models seems to make the
life of app developers harder for very little real gain in privacy.

>
> Finally, can we mark the fingerprintability of the device enumeration
> section?
> http://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting

https://github.com/w3c/mediacapture-main/issues/251
>
> ### Identifiers
>
>> All enumerable devices have an identifier that MUST be unique to the
>> application and persistent across browsing sessions.
>
> Is there a reason why the specification does not go the extra step of
> recommending that platforms not use persistent identifiers? What are
> the use cases for the use of persistent identifiers? Could identifiers
> change between sessions rather than simply treating identifiers as
> other persistent storages (e.g. cookies)?

See above - identification of devices that have been previoiusly used.
>
> To say that such an identifier MUST persist across browsing sessions
> is a guarantee that the requirement won't be satisfied. Many users,
> for example, configure their browsers to delete all cookies on closing
> the browser. How about:
>> "Identifiers MAY be persisted across browsing sessions. Persistent
>> identifiers let the application save, identify the availability of,
>> and directly request specific sources."
>
> Any site that assumes that identifiers will persist will set
> themselves up for failure (for example, when the user clears cookies);
> the spec should not encourage that false assurance.

Clearing IDs when clearing cookies needs to be called out as a
recommended practice.

https://github.com/w3c/mediacapture-main/issues/252

We do want browsers to implement persistence when the user has granted
permissions, for the reasons outlined above.
>
> What protections are in place to ensure against leakage of
> identifiers? "unique to the application" does not seem to be fully or
> clearly defined. Does that mean "unique among all deviceIds available
> to a particular origin"? Or does it mean "different from the deviceId
> presented for the same device to all other origins"? On the mailing
> list, a proposal has been mentioned to double-key these identifiers
> (on the origin of the top-level document and the embedded iframe,
> presumably) but that has not yet been detailed.

The latter is what was intended - so that the ID does not form a
cross-origin cookie.

>
> Per our teleconference conversation about this some months ago, it's
> not entirely clear why a GUID is necessary rather than, say,
> ["1","2","3"]. In any case, specifying exactly what the scope of an
> identifier is and how it differs in different contexts is important.
>
> There are some concerns about access to a persistent deviceId
> identifier, prior to user-granted permission for accessing camera or
> microphone, because of the duplication of cookie-like functionality.
> See: http://www.w3.org/mid/010d01d0b195$4ef90350$eceb09f0$@baycloud.com

Yes. This is addressed (as mentioned above).

>
> ### Indicators
>
> Would it be possible to include mechanisms in the specification to
> display indicators beyond merely that the device is in use? (e.g. that
> a permission is persistent) Or, how is it expected that this will be
> handled?
This is specified under 10.2.1 (getUserMedia methods):

"If the user grants permission to use local recording devices, User
Agents are encouraged to include a prominent indicator that the devices
are "hot" (i.e. an "on-air" or "recording" indicator), as well as a
"device accessible" indicator indicating that the page has been granted
access to the source."

>
> Are indicators provided regarding providing access to
> camera/microphone in insecure or unprivileged contexts? Can we give
> guidance regarding warning users that access to their camera may be
> provided to all network attackers?

I can't see at the moment what that text should say. We already have
indicators of insecure origin as a separate concern; the intersection of
insecure origin and "permission granted" is as far as I think it's
reasonable to go.

If someone has text that they would like to suggest.... welcome!

>
> ### Events
>
>> When a new media input or output device is made available, the user
>> agent MUST queue a task fires a simple event named devicechange at
>> the MediaDevices object.
>
> This event appears to be fired even for web pages that have not
> requested any permissions from the user. Is that intended?
Yes. However, that was not a very deeply discussed decision - so it
might be changeable.
>
> Particularly if this event will be fired before any permission is
> granted, it is important that it not be fired simultaneously in all
> browsing contexts. Sites can use simultaneous firing to correlate
> browsing activity in different tabs, different windows (including
> private windows), different browsers, in a way that may be unexpected
> to the user and undermine other protections they're attempting to
> implement. Some specs have resolved this problem by noting that the
> event should only be fired for the front-most or active browsing context.
I'm not sure how this would work for availability of devices - it would
be strange indeed if my comms client would only notice new devices if it
was the foreground tab.

Playing around with timing might alleviate the problem - but I'm not
clear that the added complexity buys enough defense against a particular
attack to be worth it.

>
> (This may not be a problem for the other events, which are specific to
> a media stream already accessed by a script. Muting could be an event
> that would be fired for all open audio streams at once, which might
> reveal some information, but that seems like a much lesser concern
> since the user would have already granted specific permission to those
> sites to access media from the user.)
>
> ### Privacy considerations
>
> It's a fine model to have the security and privacy considerations
> section be a summary of normative requirements noted elsewhere, rather
> than adding them after the fact, as it were.
>
> However, not all of the comments in this section seem to correspond to
> normative requirements. For example,
>
>> In the case of a case-by-case authorization, it is important that the
>> user be able to say "no" in a way that prevents the UI from blocking
>> user interaction until permission is given - either by offering a way
>> to say a "persistent NO" or by not using a modal permissions dialog.
>
> There are no apparent normative requirements regarding the modality of
> the permissions dialog. If this is important and intended to be an
> interoperability requirement, it should be specified as such in the
> getUserMedia method description.

I think nobody's looked at this text for a while - all the browsers have
implemented non-modal dialogs (door hangers) for case-by-case
authorization. We might want to just delete the text.

>
> The "note" section includes a description of a very serious attack. Is
> there anything that can be done about this beyond a note to website
> implementers, who may never read this section of the specification? Is
> it the case that any site that requests getUserMedia permission that
> subsequently suffers any sort of XSS vulnerability or URL parameter
> failure as you note will silently give live access to the user's
> video/audio to an attacker? As a site developer, am I liable if I use
> getUserMedia in one part of my site, users persist the permission and
> then somewhere else on my site I have a bug that allows for XSS or a
> URL parameter failure?

I'm not sure what the word "liable" means in this context - it's a word
I usually try to avoid using unless I'm sure what I mean by it.

Any site that requests getUserMedia permission and has that permission
persisted will have access to that permission - that's a given. If the
site can be tricked into running other sites' Javascript - that site has
a problem. I think that's an issue in all contexts, since running other
sites' Javascript immediately renders all the considerations for "secure
origin" null and void.

I don't know that this is something that Media Capture needs to call out
specifically.

>
> One way to help developers avoid this catastrophic scenario would be
> to allow sites to indicate whether they were confident about opting in
> to persisted permissions. A casual developer who calls getUserMedia()
> wouldn't have to worry about that failure due to some bug on their
> site. A serious developer who is confident about their security
> situation and whose functionality would benefit greatly from
> persistence could call getUserMedia(allowPersist=true).

Hm. I'd like to hear others' opinions on this.

>
> ### Local IP address
>
> Some discussions on the mailing list have touched on broader concerns
> about WebRTC and access to a user's local IP address. We understand
> that to be an ongoing discussion. The Privacy Interest Group is
> maintaining a wiki page on that topic in order to respond to that
> separate request: http://www.w3.org/wiki/Privacy/IPAddresses 

Nice to know that you're keeping track of this.
Received on Monday, 21 September 2015 11:24:01 UTC