Re: Comments/Questions on Media Capture Streams – Privacy and Security Considerations from Nick Doty on 2015-10-24 (public-privacy@w3.org from October to December 2015)

From: Nick Doty <npdoty@w3.org>
Date: Fri, 23 Oct 2015 17:07:00 -0700
To: Harald Alvestrand <harald@alvestrand.no>
Cc: public-media-capture@w3.org, "public-privacy (W3C mailing list)" <public-privacy@w3.org>
Message-Id: <8ADE1FE7-A77F-45C5-AFFD-CD25CC2111AD@w3.org>
And my apologies for getting behind on this thread and not realizing that I might be blocking. Below I've tried to answer questions or provide clarifications. This email isn't a review of issues/changes you've since made with verification that they address the raised issue; if that's also needed, please let me know.

> On Sep 21, 2015, at 4:23 AM, Harald Alvestrand <harald@alvestrand.no> wrote:
> 
> Apologies for it taking so long before making substantive response - I seem to have become the "designated driver" for this particular discussion.
> 
> On 07/14/2015 04:28 AM, Nick Doty wrote:
>> ### Consent/Permissions
>> 
>> Do permissions carry forward across sessions? Could there be a built-in sunset period for permissions? European participants in particular have raised this concern as it may be related to legal compliance in the EU.
>> 
>> It would be nice if there was a simple, user friendly way to revoke consent for a stream (especially audio/webcam streams). As it currently stands, once consent is granted there doesn't seem to be simple way to revoke it.
> 
> Permissions are given to an origin on a device; there is no place in the model for permissions on streams. Adding this would make things more complicated and not more secure, since an origin could simply create a new stream if permission for a stream was revoked.

Apologies if our wording was unclear here; I referred to permissions "for a stream", when I believe I meant permissions for camera/microphone access on an origin, not the particular stream itself.

> Permissions carry forward only if persisted; the decision to persist is taken by the UA, not by JS. We explicitly forbid persisting permission for insecure pages.
> 
> There is no API function to revoke permissions. We have advice in section 13 (security and privacy considerations) saying that "it is important that it is easy to find the list of granted permissions and revoke permissions that the user wishes to revoke." We haven't found a reason to give more specific advice to implementors here.

Regarding an API function, it does seem like there is a security and privacy advantage to providing developers with their own ability to revoke permissions.

I was aggregating comments from the Privacy Interest Group on this point; it might be that there comments about revocation were based on testing implementations and finding revocability lacking. A normative requirement for UA revocation of permissions that are persisted might make sense, so that the API is implemented in a minimally privacy-supportive way.

> In implementations, we have also found it reasonable to erase all stored permissions when clearing cookies for that origin; it may be reasonable to give advice on this in the document.

If that's common across implementations and might influence the expectations of developers, then it seems like a potential interoperability issue that should be a normative requirement in the spec.

>>> "when the page is secure"
>> 
>> "secure" is a word that often gets defined in different ways. Would it be more precise to refer to "privileged contexts"?
>> http://www.w3.org/TR/powerful-features/#settings-privileged <http://www.w3.org/TR/powerful-features/#settings-privileged>
>> 
>> Not persisting permissions in such settings is a good base-line requirement. Section 10.6 states that persistent permissions must be be served over HTTPS and have no mixed content. It would be nice to see the definition of mixed content expanded to include the various issues mentioned in Bonneau's recent paper[1]. For example, if a site elects to use pinning, it should be considered to have mixed content if it loads non-pinned content.
>> 
>> [1]  <http://www.jbonneau.com/doc/KB15-NDSS-hsts_pinning_survey.pdf>http://www.jbonneau.com/doc/KB15-NDSS-hsts_pinning_survey.pdf <http://www.jbonneau.com/doc/KB15-NDSS-hsts_pinning_survey.pdf>
>> 
>> [Note: This last point is perhaps also relevant to  <http://www.w3.org/TR/mixed-content/]>http://www.w3.org/TR/mixed-content/ <http://www.w3.org/TR/mixed-content/>]
> 
> We refer to https://www.w3.org/TR/mixed-content/ <https://www.w3.org/TR/mixed-content/> - we do not want to redefine the concept in this document, believing that this would only cause confusion for implementors.
> If mixed-content needs updating, then that is the proper place to fix the issue.

Joe and Greg, I believe you had identified this particular concern and connection to the Bonneau paper. Can we check whether the problem needs to be addressed in the Mixed Content, this spec, both or neither?

>> Permissions for getUserMedia seem to be specific to entry script origin. Is this what users will expect? For example, if I grant and persist permission to callmyfriends.com <http://callmyfriends.com/> to use their service and later I browse to example.com <http://example.com/> which has an embedded iframe of callmyfriends.com <http://callmyfriends.com/>, will users be shocked to see their camera turn on and a picture of themselves on the screen? Permission breadth may be a flexible option for the user agent ("Optionally, e.g., based on a previously-established user preference, for security reasons"), but it might be useful for the spec to establish some expectations here. Top-level origin/embedded origin pairs, for example, might be a useful model, as in some implementations of Geolocation.
> 
> The converse issue (example.com has permission, and callmyenemy.com pops up inside an iframe and inherits the permission) has been discussed, with the suggestion that the iframe sandbox should strip away the permission.
> 
> I don't think this version of the iframe issue has been discussed. I do think the "embed a call function" is an important enough use case that disallowing it will surprise implementors.

I believe the change wouldn't need to be disallowing embedded iframes with Media Capture permissions, but a distinction in how permissions are requested/granted/persisted.

>> ### Device enumeration
>> 
>> Why is there no requirement for user permission before a platform detects how many devices of each class are connected/available? Does the specification provide a mechanism to allow a user agent to deny access if an application is not in use?
> 
> This was the result of an extensive discussion about the need to limit fingerprinting surface vs the need to present appropriate UI - for instance, implementors did not want to present a camera choice button if only one camera was available, or a video-call button if no camera was available at all.
> 
> The amount of fingerprinting exposed by counting devices seemed small enough to be acceptable, and we did not see any other related risk.
> 
>> 
>> Can we specify the order in which devices should be listed? If this will vary, it will make it that much easier to fingerprint the user agent, based not only on what kinds of devices they have attached, but what order the software happened to list those devices. (For example, see our experience with font listing.)
> 
> For which value of "we"?

I mean the editors and groups publishing the specification; apologies, I thought that was clear. Published specifications are useful places to specify behavior to enable consistency.

> It seems that unless the user-agent string is banned, this has very low value.

I am not usually persuaded by this argument, unless we think the variation/entropy is likely to be identical to and consistent with the user-agent string. In the case of font listings, for example, listing fonts in the order they were installed or the file system order in the OS provides entropy quite distinct from the version of the installed browser.

>> Does this need to be enumerable? Fingerprintability of plugins, for example, can be dramatically reduced by changing it to a query model. Does the user have a camera attached? Does the user have a microphone attached? If so, the site can then ask the user for permission and when they do so, they can get deviceIds, kinds and grouping of devices, labels, etc. Related: what purpose does the deviceId serve prior to granting of permission (as opposed to just knowing the kinds/capabilities)?
> Persistence for a site that has previously used the camera, and wants to present a different dialog when returning (perhaps opening the camera without comment if it already knows which one to use, asking the user to choose the "special camera" if it doesn't know which to use).
> 
>> Does the site need to know that I have a microphone and a grouped microphone/webcam and a separate webcam *before* asking me for permission to access my camera? If the enumeration and identifiers are only present after asking for permission, then no additional permission prompt is needed and leakage of information can be reduced.
>> 
>> Imagine that you were writing a browser that wanted to reduce fingerprinting and was willing to limit functionality but didn't want to drop functionality altogether. Is there any compliant way for that browser to indicate prior to the permission prompt, "yes, video/audio are supported" without enumerating the configuration of devices?
> No, this is not supported now.
> 
> Note that a recent change (https://github.com/w3c/mediacapture-main/pull/219 <https://github.com/w3c/mediacapture-main/pull/219>) removed the ability to persist IDs without successfully requesting a device.

Cool. It sounds like that would address the concern about access to persistent deviceIds prior to a permission grant.

>> It seems like user agents are given some flexibility on how they select constraints for the constrainability pattern, can we provide similar flexibility as to how they indicate capabilities rather than device enumeration?
> I don't understand what this question means, so I'll skip answering it.

Sorry if this was unclear or hypothetical. I was trying to note that getSupportedConstraints() was a method that provided user agents with a way to tell site developers which properties were supported for constrainable. An analogous alternative to the enumerateDevices() model would be to provide constraints; a browser concerned about these fingerprintability issues could either support just capabilities (camera attached, microphone attached) or limit which properties were returned in an enumeration.

>> ### Events
>> 
>>> When a new media input or output device is made available, the user agent MUST queue a task fires a simple event named devicechange at the MediaDevices object.
>> 
>> This event appears to be fired even for web pages that have not requested any permissions from the user. Is that intended?
> Yes. However, that was not a very deeply discussed decision - so it might be changeable.
>> 
>> Particularly if this event will be fired before any permission is granted, it is important that it not be fired simultaneously in all browsing contexts. Sites can use simultaneous firing to correlate browsing activity in different tabs, different windows (including private windows), different browsers, in a way that may be unexpected to the user and undermine other protections they're attempting to implement. Some specs have resolved this problem by noting that the event should only be fired for the front-most or active browsing context.
> I'm not sure how this would work for availability of devices - it would be strange indeed if my comms client would only notice new devices if it was the foreground tab.

I'm not sure I would expect a tab that I wasn't using but had access to a microphone to take action immediately upon plugging in a new microphone.

Separate and more important, does every tab and iframe that *doesn't* have permission to access cameras or microphones need to receive a simultaneous event when you plug a camera or microphone in? That seems like functionality unlikely to help the user, while making it harder to isolate activity via private browsing windows or separate browsers.

> Playing around with timing might alleviate the problem - but I'm not clear that the added complexity buys enough defense against a particular attack to be worth it.

I agree that while fuzzing the timing of events is one possible mitigation, it seems like a complicated one, and one where it would be difficult to guarantee a certain level of protection. (Limiting to active tabs seems more straightfoward and effective.)

>> The "note" section includes a description of a very serious attack. Is there anything that can be done about this beyond a note to website implementers, who may never read this section of the specification? Is it the case that any site that requests getUserMedia permission that subsequently suffers any sort of XSS vulnerability or URL parameter failure as you note will silently give live access to the user's video/audio to an attacker? As a site developer, am I liable if I use getUserMedia in one part of my site, users persist the permission and then somewhere else on my site I have a bug that allows for XSS or a URL parameter failure?
> 
> I'm not sure what the word "liable" means in this context - it's a word I usually try to avoid using unless I'm sure what I mean by it.

I don't mean to make an assessment of legal liability. I mean, as a site developer, am I going to be putting my users at serious risk by asking for this permission in the case where I have another security bug somewhere on my site?

> Any site that requests getUserMedia permission and has that permission persisted will have access to that permission - that's a given. If the site can be tricked into running other sites' Javascript - that site has a problem. I think that's an issue in all contexts, since running other sites' Javascript immediately renders all the considerations for "secure origin" null and void.
> 
> I don't know that this is something that Media Capture needs to call out specifically.

It's no doubt that a site has a security problem (whether it uses Media Capture or not) if it allows reflected XSS vulnerabilities or session fixation/URL parameter attacks. Persisted camera/microphone permission makes a difference here because a user won't be prompted to make a decision about granting permissions to access the camera (which would let the user do things like make a guess about whether it makes sense for that site, or decide whether they're in an intimate setting or not), not just that an attacker can take advantage of the site's good name by making them execute JavaScript.

Hope this helps,
Nick
Received on Saturday, 24 October 2015 00:07:11 UTC