Re: Request for feedback: Media Capture and Streams Last Call

PING, here are some notes from reviewing the Media Capture and Streams spec. I haven't gone through the complete TAG questionnaire or fingerprinting best practices, but have used some of them. Has someone volunteered to "shepherd" this review and gather all comments to send them along to the WebRTC folks?

To Mike's point, I agree we should be careful about the uniqueness and persistence of identifiers, but to Ekr's/Jan-Ivar's, I think if the persistence is identical to cookies and other local storage mechanisms, then there isn't a new problem, just a persistence that should be noted.

Sorry I missed the call this week, but enjoying the conversations on list.
Thanks,
Nick


## persisted permissions

"when the page is secure"

"secure" is a word that often gets defined in different ways. Would it be more precise to refer to "privileged contexts"?
http://www.w3.org/TR/powerful-features/#settings-privileged <http://www.w3.org/TR/powerful-features/#settings-privileged>

Not persisting permissions in such settings is a good base-line requirement.

I believe you've heard from the TAG already about whether use of the API ever makes sense in unprivileged contexts. That is, when the user is asked for permission to access their camera, do they understand that they're granting this permission to all network attackers as well as the site they think they're talking to? I suspect this PING email thread is not going to change your minds about that already discussed topic. However, I think it would be worthwhile to note this security threat in the security considerations section and to note for user agent implementers the difficulty for this permission prompt.

Best Practice 2 is in a section entitled "Implementation Suggestions", but contains a normative MUST statement. If this is an interoperability requirement and MUST is defined as in 2119, then I think "suggestions" (and indeed, "best practice") is probably incorrect terminology.

Permissions for getUserMedia seem to be specific to entry script origin. Is this what users will expect? For example, if I grant and persist permission to callmyfriends.com <http://callmyfriends.com/> to use their service and later I browse to example.com <http://example.com/> which has an embedded iframe of callmyfriends.com <http://callmyfriends.com/>, will users be shocked to see their camera turn on and a picture of themselves on the screen? Permission breadth may be a flexible option for the user agent ("Optionally, e.g., based on a previously-established user preference, for security reasons"), but it might be useful for the spec to establish some expectations here. Top-level origin/embedded origin pairs, for example, might be a useful model, as in some implementations of Geolocation.


## identifiers

> All enumerable devices have an identifier that MUST be unique to the application and persistent across browsing sessions.

To say that such an identifier MUST persist across browsing sessions is a guarantee that the requirement won't be satisfied. Many users, for example, configure their browsers to delete all cookies on closing the browser. How about:
"Identifiers MAY be persistent across browsing sessions. Persistent identifiers let the application save, identify the availability of, and directly request specific sources."
Any site that assumes that identifiers will persist will set themselves up for failure (for example, when the user clears cookies); the spec should not encourage that false assurance.

"unique to the application" does not seem to be fully or clearly defined. Does that mean "unique among all deviceIds available to a particular origin"? Or does it mean "different from the deviceId presented for the same device to all other origins"?

Per our teleconference conversation about this some months ago, it's not entirely clear to me why a GUID is necessary rather than, say, ["1","2","3"], but it probably doesn't make a difference to the user's privacy. However, specifying exactly what the scope of an identifier is and how it differs is important.

## device enumeration

Can we specify the order in which devices should be listed? If this will vary, it will make it that much easier to fingerprint the user agent, based not only on what kinds of devices they have attached, but what order the software happened to list those devices. (For example, see our experience with font listing.)

Does this need to be enumerable? Fingerprintability of plugins, for example, can be dramatically reduced by changing it to a query model. Does the user have a camera attached? Does the user have a microphone attached? If so, the site can then ask the user for permission and when they do so, they can get deviceIds, kinds and grouping of devices, labels, etc. Related: what purpose does the deviceId serve prior to granting of permission (as opposed to just knowing the kinds/capabilities)? Does the site need to know that I have a microphone and a grouped microphone/webcam and a separate webcam *before* asking me for permission to access my camera?

Imagine that you were writing a browser that wanted to reduce fingerprinting and was willing to limit functionality but didn't want to drop functionality altogether. Is there any compliant way for that browser to indicate prior to the permission prompt, "yes, video/audio are supported" without enumerating the configuration of devices? It seems like user agents are given some flexibility on how they select constraints for the constrainability pattern, can we provide similar flexibility as to how they indicate capabilities rather than device enumeration?

Can we mark the fingerprintability of the device enumeration section?
http://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting <http://w3c.github.io/fingerprinting-guidance/#mark-fingerprinting>

## events

> When a new media input or output device is made available, the user agent MUST queue a task fires a simple event named devicechange at the MediaDevices object.

This event appears to be fired even for web pages that have not requested any permissions from the user. Is that intended?

Particularly if this event will be fired before any permission is granted, it is important that it not be fired simultaneously in all browsing contexts. Sites can use simultaneous firing to correlate browsing activity in different tabs, different windows (including private windows), different browsers, in a way that may be unexpected to the user and undermine other protections they're attempting to implement. Some specs have resolved this problem by noting that the event should only be fired for the front-most or active browsing context.

(I believe that isn't a problem for the other events, which are specific to a media stream already accessed by a script. Muting could be an event that would be fired for all open audio streams at once, which might reveal some information, but that seems like a much lesser concern since the user would have already granted specific permission to those sites to access media from the user.)

## privacy considerations

I think it's a fine model to have the security and privacy considerations section be a summary of normative requirements noted elsewhere, rather than adding them after the fact, as it were.

However, not all of the comments in this section seem to correspond to normative requirements. For example,
> In the case of a case-by-case authorization, it is important that the user be able to say "no" in a way that prevents the UI from blocking user interaction until permission is given - either by offering a way to say a "persistent NO" or by not using a modal permissions dialog.
I don't see any normative requirements regarding the modality of the permissions dialog. If this is important and intended to be an interoperability requirement, it should be specified as such in the getUserMedia method description.

The "note" section includes a description of a very serious attack. Is there anything that can be done about this beyond a note to website implementers, who may never read this section of the specification? Is it the case that any site that requests getUserMedia permission that subsequently suffers any sort of XSS vulnerability or URL parameter failure as you note will silently give live access to the user's video/audio to an attacker? As a site developer, am I liable if I use getUserMedia in one part of my site, users persist the permission and then somewhere else on my site I have a bug that allows for XSS or a URL parameter failure?

One way to help developers avoid this catastrophic scenario would be to allow sites to indicate whether they were confident about opting in to persisted permissions. A casual developer who calls getUserMedia() wouldn't have to worry about that failure due to some bug on their site. A serious developer who is confident about their security situation and whose functionality would benefit greatly from persistence could call getUserMedia(allowPersist=true).

Received on Tuesday, 30 June 2015 18:15:49 UTC