Separating simple camera access and P2P authorization (was: Re: Clarification on media capture split between WebRTC and DAP) from Rich Tibbett on 2011-08-17 (public-webrtc@w3.org from August 2011)

From: Rich Tibbett <richt@opera.com>
Date: Wed, 17 Aug 2011 14:46:31 +0200
To: Harald Alvestrand <harald@alvestrand.no>
CC: roBman@mob-labs.com, "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <4E4BB827.4020606@opera.com>
Harald Alvestrand wrote:
> On 08/16/11 17:07, Rich Tibbett wrote:
>> Harald Alvestrand wrote:
>>> On 08/16/11 12:10, Rob Manson wrote:
>>>> +1 for finer grained separate authorisation between streaming to a
>>>> remote server vs. streaming into a<video> or<audio> tag. This is
>>>> essential.
>>> -1.
>>>
>>> I have problems with this - it seems to say that when asking for a
>>> camera, the JS has to specify API parameters that specify what the
>>> purpose of the stream is - with an as-yet undefined vocabulary.
>>
>> 1. Always provide a tainted Stream object to a camera-requesting web
>> page requiring no up-front user authorization. Treat this Stream
>> object as a cross-origin resource, thereby tainting any resulting
>> <canvas> copy and disabling all camera snapshot and camcorder
>> recording functionality against that Stream object in its tainted state.
> When you say "taint", are you referring to the CORS concept as described
> in http://dvcs.w3.org/hg/cors/raw-file/tip/Overview.html? I'm afraid
> that document doesn't define the term, but it seems to be used a lot in
> discussions around CORS - do you mean that we regard the Stream object
> as having an Origin outside any current context?

This section of the WHATWG spec summarizes the concept of tainting an 
object via the origin-clean flag, applied to <canvas> elements with 
content coming from a different origin:

http://www.whatwg.org/specs/web-apps/current-work/multipage/the-canvas-element.html#security-with-canvas-elements

>
>>
>> 2. Require the web app to assign and display the tainted Stream object
>> in a <video> element in order for the user to authorize its allowed
>> capabilities (in Step 4 below).
> Is this synonymous with "let only a Video element have the capability of
> untainting a stream"?
>>
>> 3. Let the web app to provide a hint to the Stream object (or <video>
>> element) for the type of access to the camera it actually requires - a
>> list of one or more of the following tokens: 'camera', 'camcorder',
>> 'streaming', 'telephony'.
>>
>> 4. Let the UA present the following <video>-overlaid stream control
>> buttons to the user depending on the access hint(s) registered above:
>>
>> - a 'camera' button > to take a still image capture to generate an
>> image file that the web app can register a callback to receive.
>> - a 'camcorder' button > to take a video capture to generate a video
>> file that the web app can register a callback to receive.
>> - a 'streaming' button > to allow the current web page to un-taint the
>> Stream object and allow the web app to access that streaming data
>> (e.g. via a <canvas> element or an Audio API). Either ON or OFF
>> (default).
>> - a 'telephony' button > to allow the current web page to then assign
>> the Stream object to the P2P communication API without throwing e.g. a
>> Security Violation error. Either ON or OFF (default).
>>
>> 5. On user click of any of the stream control buttons presented,
>> enable the inferred functionality, fire a callback to the web app and
>> let it continue about it's intended business.
> Doesn't this mean that we're back to "click a button to allow access
> every time you call"?

Clicking to provide 'telephony' or 'streaming' could be sticky. When you 
activate such a permission it remains active until the user actively 
clicks to deactivate it again or disconnects by some other means such as 
browsing to a different URL.

>
> I kind of like the idea that we use a <video> element with
> browser-controlled controls on it for the authorization step (presumably
> the app can hide that <video> element after having obtained the
> authorization, if he so desires), but I'm fundamentally worried about
> "extra click every time you call".

True. A developer could hide the video element immediately after 
obtaining some specific user authorization. That may actually be part of 
the intended behavior for a web app. Maybe local camera playback doesn't 
need to be visible at all times in that app. It's useful for it to be 
visible initially so the user knows the camera is to be accessed.

The step I failed to mention is to also require the UA to display a 
chrome-based indicator when permissions are "on". This is already a 
recommendation included in the WHATWG P2P proposal:

"If the user grants permission to use local recording devices, user 
agents are encouraged to include a prominent indicator that the devices 
are "hot" (i.e. an "on-air" or "recording" indicator)."

..from which the user should be able to revoke access by clicking such 
an indicator regardless of whether the video element remains visible on 
the web page or not.

>>
>> This approach immediately lets the user see that the page is ready to
>> do something with their camera without requiring an up-front
>> prompt/access authorization. It lets the user get comfortable with the
>> fact that the camera is on rather than it happening all at once. It
>> lets them adjust their hair before they go live. It presents
>> UA-controlled buttons within a <video> element for the user to
>> authorize (or not) the requested, targeted usage when they are ready.
>>
>> FWIW, this discussion is entirely orthogonal to the P2P aspects
>> ('streaming' in this email refers to streaming the video to the web
>> page, or, untainting the provided Stream object. 'telephony' refers to
>> the p2p communication stuff).
>>
>> I can understand the push-back but I'd be interested for webrtc to
>> explore such an approach a little further. Accessing the camera
>> without streaming the results to a remote server has
>> wide-applicability e.g. AR guides, bar-code scanning web apps,
>> camera/snapshot/fx web apps, on-demand a/v recording uploads, personal
>> introductions, etc.
> It does, and if we can accept "click every time you <scan, photograph,
> videotape, .....>" in all those contexts, this might be a general
> mechanism.
>
> I'm just very unsure about this mechanism being generally acceptable.
>

The model is a reversal on previous thinking: provide an unauthorized 
but tainted webcam/microphone view to the web page and allow the user to 
elevate the permissions at their discretion as and when they are 
requested by the web page.

If we simplify to the point of sticky permission sets, does that 
alleviate some of the concern? Once you've clicked the telephony button 
the page can make as many calls as it likes with the untainted Stream 
object.

- Rich
Received on Wednesday, 17 August 2011 12:47:10 UTC