RE: Separating simple camera access and P2P authorization (was: Re: Clarification on media capture split between WebRTC and DAP) from Adam Bergkvist on 2011-08-17 (public-webrtc@w3.org from August 2011)

From: Adam Bergkvist <adam.bergkvist@ericsson.com>
Date: Wed, 17 Aug 2011 18:09:00 +0200
To: Rich Tibbett <richt@opera.com>, Harald Alvestrand <harald@alvestrand.no>
CC: "roBman@mob-labs.com" <roBman@mob-labs.com>, "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <A1249B08688639468D1CB181445EE79D25DC194CAC@ESESSCMS0355.eemea.ericsson.se>
On 17 augusti 2011 14:47, Rich Tibbett wrote:

> Harald Alvestrand wrote:
>> On 08/16/11 17:07, Rich Tibbett wrote:
>>> Harald Alvestrand wrote:
>>>> On 08/16/11 12:10, Rob Manson wrote:
>>>>> +1 for finer grained separate authorisation between streaming to a
>>>>> remote server vs. streaming into a<video> or<audio> tag. This is
>>>>> essential.
>>>> -1.
>>>> 
>>>> I have problems with this - it seems to say that when asking for a
>>>> camera, the JS has to specify API parameters that specify what the
>>>> purpose of the stream is - with an as-yet undefined vocabulary.
>>> 
>>> 1. Always provide a tainted Stream object to a camera-requesting web
>>> page requiring no up-front user authorization. Treat this Stream
>>> object as a cross-origin resource, thereby tainting any resulting
>>> <canvas> copy and disabling all camera snapshot and camcorder
>>> recording functionality against that Stream object in its
> tainted state.
>> When you say "taint", are you referring to the CORS concept as
>> described in http://dvcs.w3.org/hg/cors/raw-file/tip/Overview.html?
>> I'm afraid that document doesn't define the term, but it seems to be
>> used a lot in discussions around CORS - do you mean that we regard
>> the Stream object as having an Origin outside any current context?
> 
> This section of the WHATWG spec summarizes the concept of
> tainting an object via the origin-clean flag, applied to
> <canvas> elements with content coming from a different origin:
> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/th
> e-canvas-element.html#security-with-canvas-elements
> 
>> 
>>> 
>>> 2. Require the web app to assign and display the tainted Stream
>>> object in a <video> element in order for the user to authorize its
>>> allowed capabilities (in Step 4 below).
>> Is this synonymous with "let only a Video element have the capability
>> of untainting a stream"?
>>> 
>>> 3. Let the web app to provide a hint to the Stream object (or
>>> <video> element) for the type of access to the camera it actually
>>> requires - a list of one or more of the following tokens: 'camera',
>>> 'camcorder', 'streaming', 'telephony'. 
>>> 
>>> 4. Let the UA present the following <video>-overlaid stream control
>>> buttons to the user depending on the access hint(s) registered
>>> above: 
>>> 
>>> - a 'camera' button > to take a still image capture to generate an
>>> image file that the web app can register a callback to receive.
>>> - a 'camcorder' button > to take a video capture to generate a video
>>> file that the web app can register a callback to receive.
>>> - a 'streaming' button > to allow the current web page to un-taint
>>> the Stream object and allow the web app to access that streaming
>>> data (e.g. via a <canvas> element or an Audio API). Either ON or
>>> OFF (default). - a 'telephony' button > to allow the current web
>>> page to then assign the Stream object to the P2P communication API
>>> without throwing e.g. a Security Violation error. Either ON or OFF
>>> (default). 
>>> 
>>> 5. On user click of any of the stream control buttons presented,
>>> enable the inferred functionality, fire a callback to the web app
>>> and let it continue about it's intended business.
>> Doesn't this mean that we're back to "click a button to allow access
>> every time you call"?
> 
> Clicking to provide 'telephony' or 'streaming' could be
> sticky. When you activate such a permission it remains active
> until the user actively clicks to deactivate it again or
> disconnects by some other means such as browsing to a different URL.
> 
>> 
>> I kind of like the idea that we use a <video> element with
>> browser-controlled controls on it for the authorization step
>> (presumably the app can hide that <video> element after having
>> obtained the authorization, if he so desires), but I'm fundamentally
>> worried about "extra click every time you call".
> 
> True. A developer could hide the video element immediately
> after obtaining some specific user authorization. That may
> actually be part of the intended behavior for a web app.
> Maybe local camera playback doesn't need to be visible at all
> times in that app. It's useful for it to be visible initially
> so the user knows the camera is to be accessed.
> 
> The step I failed to mention is to also require the UA to
> display a chrome-based indicator when permissions are "on".
> This is already a recommendation included in the WHATWG P2P proposal:
> 
> "If the user grants permission to use local recording
> devices, user agents are encouraged to include a prominent
> indicator that the devices are "hot" (i.e. an "on-air" or "recording"
> indicator)." 
> 
> ..from which the user should be able to revoke access by
> clicking such an indicator regardless of whether the video
> element remains visible on the web page or not.
> 
>>> 
>>> This approach immediately lets the user see that the page is ready
>>> to do something with their camera without requiring an up-front
>>> prompt/access authorization. It lets the user get comfortable with
>>> the fact that the camera is on rather than it happening all at once.
>>> It lets them adjust their hair before they go live. It presents
>>> UA-controlled buttons within a <video> element for the user to
>>> authorize (or not) the requested, targeted usage when they are
>>> ready. 
>>> 
>>> FWIW, this discussion is entirely orthogonal to the P2P aspects
>>> ('streaming' in this email refers to streaming the video to the web
>>> page, or, untainting the provided Stream object. 'telephony' refers
>>> to the p2p communication stuff).
>>> 
>>> I can understand the push-back but I'd be interested for webrtc to
>>> explore such an approach a little further. Accessing the camera
>>> without streaming the results to a remote server has
>>> wide-applicability e.g. AR guides, bar-code scanning web apps,
>>> camera/snapshot/fx web apps, on-demand a/v recording uploads,
>>> personal introductions, etc.
>> It does, and if we can accept "click every time you <scan,
>> photograph, videotape, .....>" in all those contexts, this might be
>> a general mechanism. 
>> 
>> I'm just very unsure about this mechanism being generally acceptable.
>> 
> 
> The model is a reversal on previous thinking: provide an
> unauthorized but tainted webcam/microphone view to the web
> page and allow the user to elevate the permissions at their
> discretion as and when they are requested by the web page.
> 
> If we simplify to the point of sticky permission sets, does
> that alleviate some of the concern? Once you've clicked the
> telephony button the page can make as many calls as it likes
> with the untainted Stream object.
> 
> - Rich

Hi

Is it a serious privacy issue when you trust the AR application enough
to run it? If it can't access the content in your video stream it would
have to know other things like your position and orientation to overlay
the proper information.

In other cases when you give the web app access to, e.g. an image or
video file with <input type=file> you don't put any constraints on what
the web app may do with it. If you don't trust the app you don't give it
access to your data anyhow.

I think an indicator in the browser chrome that shows if the camera/mic
is hot, is good enough for version one.

Buttons overlayed on the video would be vulnerable to click-jacking.
It's a general problem with in-page UI since the web app can, e.g., fake
the revoke button. To solve this, the browser would have to render a
separate self-view and then we're back to the
<input type="file" accept="video"> case.

Security-wise there wouldn't be much difference between 'streaming' and
'telephony' since 'streaming' gives the web app access to a continuous
stream of images which can be sent over the network.

/Adam
Received on Wednesday, 17 August 2011 16:09:36 UTC