Re: Clarification on media capture split between WebRTC and DAP from Harald Alvestrand on 2011-08-17 (public-webrtc@w3.org from August 2011)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Wed, 17 Aug 2011 13:30:37 +0200
To: Rich Tibbett <richt@opera.com>
CC: roBman@mob-labs.com, "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <4E4BA65D.6030906@alvestrand.no>
I'm sure what you say makes sense, but I'm not sure I understand it....

On 08/16/11 17:07, Rich Tibbett wrote:
> Harald Alvestrand wrote:
>> On 08/16/11 12:10, Rob Manson wrote:
>>> +1 for finer grained separate authorisation between streaming to a
>>> remote server vs. streaming into a<video> or<audio> tag. This is
>>> essential.
>> -1.
>>
>> I have problems with this - it seems to say that when asking for a
>> camera, the JS has to specify API parameters that specify what the
>> purpose of the stream is - with an as-yet undefined vocabulary.
>
> 1. Always provide a tainted Stream object to a camera-requesting web 
> page requiring no up-front user authorization. Treat this Stream 
> object as a cross-origin resource, thereby tainting any resulting 
> <canvas> copy and disabling all camera snapshot and camcorder 
> recording functionality against that Stream object in its tainted state.
When you say "taint", are you referring to the CORS concept as described 
in http://dvcs.w3.org/hg/cors/raw-file/tip/Overview.html? I'm afraid 
that document doesn't define the term, but it seems to be used a lot in 
discussions around CORS - do you mean that we regard the Stream object 
as having an Origin outside any current context?

>
> 2. Require the web app to assign and display the tainted Stream object 
> in a <video> element in order for the user to authorize its allowed 
> capabilities (in Step 4 below).
Is this synonymous with "let only a Video element have the capability of 
untainting a stream"?
>
> 3. Let the web app to provide a hint to the Stream object (or <video> 
> element) for the type of access to the camera it actually requires - a 
> list of one or more of the following tokens: 'camera', 'camcorder', 
> 'streaming', 'telephony'.
>
> 4. Let the UA present the following <video>-overlaid stream control 
> buttons to the user depending on the access hint(s) registered above:
>
>   - a 'camera' button > to take a still image capture to generate an 
> image file that the web app can register a callback to receive.
>   - a 'camcorder' button > to take a video capture to generate a video 
> file that the web app can register a callback to receive.
>   - a 'streaming' button > to allow the current web page to un-taint 
> the Stream object and allow the web app to access that streaming data 
> (e.g. via a <canvas> element or an Audio API). Either ON or OFF 
> (default).
>   - a 'telephony' button > to allow the current web page to then 
> assign the Stream object to the P2P communication API without throwing 
> e.g. a Security Violation error. Either ON or OFF (default).
>
> 5. On user click of any of the stream control buttons presented, 
> enable the inferred functionality, fire a callback to the web app and 
> let it continue about it's intended business.
Doesn't this mean that we're back to "click a button to allow access 
every time you call"?

I kind of like the idea that we use a <video> element with 
browser-controlled controls on it for the authorization step (presumably 
the app can hide that <video> element after having obtained the 
authorization, if he so desires), but I'm fundamentally worried about 
"extra click every time you call".
>
> This approach immediately lets the user see that the page is ready to 
> do something with their camera without requiring an up-front 
> prompt/access authorization. It lets the user get comfortable with the 
> fact that the camera is on rather than it happening all at once. It 
> lets them adjust their hair before they go live. It presents 
> UA-controlled buttons within a <video> element for the user to 
> authorize (or not) the requested, targeted usage when they are ready.
>
> FWIW, this discussion is entirely orthogonal to the P2P aspects 
> ('streaming' in this email refers to streaming the video to the web 
> page, or, untainting the provided Stream object. 'telephony' refers to 
> the p2p communication stuff).
>
> I can understand the push-back but I'd be interested for webrtc to 
> explore such an approach a little further. Accessing the camera 
> without streaming the results to a remote server has 
> wide-applicability e.g. AR guides, bar-code scanning web apps, 
> camera/snapshot/fx web apps, on-demand a/v recording uploads, personal 
> introductions, etc.
It does, and if we can accept "click every time you <scan, photograph, 
videotape, .....>" in all those contexts, this might be a general mechanism.

I'm just very unsure about this mechanism being generally acceptable.
Received on Wednesday, 17 August 2011 11:31:19 UTC