Re: CHANGE: Use a JS Object as an argument to getUserMedia from Randell Jesup on 2011-10-06 (public-webrtc@w3.org from October 2011)

From: Randell Jesup <randell-ietf@jesup.org>
Date: Thu, 06 Oct 2011 18:55:08 -0400
To: public-webrtc@w3.org
Message-ID: <4E8E31CC.2020505@jesup.org>
On 10/6/2011 6:09 PM, Ben Strong wrote:
> On Thu, Oct 6, 2011 at 3:10 PM, Anant Narayanan <anant@mozilla.com
> <mailto:anant@mozilla.com>> wrote:
>
>     I agree that the question is tricky, but it is something that we no
>     doubt have to handle. I think that we absolutely should let the user
>     deny either video or audio even if the application requested both
>     (what are the arguments for not allowing this?). The real question
>     is of how this information is relayed back to the application.
>
>
> I have a few of arguments against this:
>
> 1) It allows the user to accidentally make contradictory/invalid
> selections. For example, a user may indicate to the app that they want
> to start a voice+video call, then when prompted by the browser
> mistakenly select the "just video" option. What should the app do at
> this point? Display an error message and force the user to start over
> again? Start a call with just the video track and leave the user to
> wonder why the other party can't hear them?

That's a question, though it depends on the application - if the 
application knows that the user will be asked to choose audio, video or 
both, it has far less reason to have the user indicate that before 
calling getUserMedia().  I.e. this perforce becomes part of the UI of 
the application, because the app can't avoid it.

> 2) It requires the user to understand the difference between granting
> permissions and enabling tracks. Let's say that a user wants to start a
> voice call with the option of enabling video after the call starts. As I
> understand it, the app should request audio+video access and then
> disable the video track, re-enabling it when the user elects to start
> video. But the user, upon seeing a request for audio+video permissions
> despite the fact that they hit the "start voice" button in the app, may
> choose "just audio", unwittingly denying themselves the ability to
> enable video later.

I have an issue here also - adding video is likely to be a common 
operation, but when you start the call you won't know if you want to 
permit access to the camera later (especially if you can't trust the 
app, and our threat model says we don't).  Adding video (or adding 
audio, or a second video or audio channel) after a call starts without 
video should invoke a user prompt (which also can allow camera 
selection, etc).  Simply the need to (possibly) select cameras and/or 
mics when adding streams implies a user prompt, not just for security 
concerns.  I don't have a proposed API solution, but it shouldn't be hard.

> 3) The dialog presented to the user when an app requests audio+video
> will most likely have two checkboxes (audio and video) and two buttons
> (allow and cancel). If the app can specify that both tracks are
> required, the dialog can get away with just the buttons, which makes for
> a much more pleasant user experience.
>
> As an app developer, I'd like to be able to specify one of three options
> for each media kind: REQUIRED, REQUIRED_IF_CAPABILITY_EXISTS, and
> OPTIONAL. When implementing a voice+video calling app, I'd pass REQUIRED
> for audio and REQUIRED_IF_CAPABILITY_EXISTS for video, which would make
> video required iff the device has a camera available. That way, if the
> call to getUserMedia() succeeds, I know that I have the "right" tracks
> given the user's intent and device capabilities (as an aside, detecting
> failure would be a lot easier with the callback api).

In many cases I'd be leery of a REQUIRED style - what if the people 
using it are deaf - you're going to require them to use audio?  What if 
they're blind; you'll require them to use video?  Now in other cases it 
might be legitimately required, but it needs to be thought about.  Also, 
the UI can be optimized for the common/suggested case, and don't forget 
my comment above - this effectively is part of the app's UI, so you need 
to think of it that way, not as an interstitial popup.

That also implies that while for security concerns it can't be under 
full control or styling of the application, it may make sense to give 
the application more control over what choices are presented to the user 
and how they're presented, in order to allow this to fit into the app UI 
better.  The downside of that is too much divergence from the "standard" 
way to indicate these could confuse the user, and we can't allow the 
application to provide the text shown.  (No "answer audio-only" button 
that causes video to be allowed.)  But we could provide a small set of 
"standard" options for the application to request be shown to the user 
to choose from, per your comments above (though not necessarily in that 
exact manner/API).

> I'm actually having a hard time thinking of a use-case for OPTIONAL,
> since apps will generally ask the user to select media types before
> calling getUserMedia(). For example, a recording app will almost
> certainly have separate "record audio" and "record audio+video" buttons
> rather than a generic "record" button which relies on the permissions
> dialog to determine the user's intent.

And that's where you may find that working inside WebRTC implies a 
different UI flow, though in some cases you will do as you suggest.

-- 
Randell Jesup
randell-ietf@jesup.org
Received on Thursday, 6 October 2011 22:59:46 UTC