Re: CHANGE: Use a JS Object as an argument to getUserMedia from Ben Strong on 2011-10-07 (public-webrtc@w3.org from October 2011)

From: Ben Strong <bstrong@gmail.com>
Date: Fri, 7 Oct 2011 10:33:05 -0500
To: public-webrtc@w3.org, Randell Jesup <randell-ietf@jesup.org>
Message-ID: <CAK_eQ=kNsW7n4uR+8Ao_p4C9167EkSwx4vkkq_dUKkO+NK4WLw@mail.gmail.com>
Thanks for the response. A few comments inline.

On Thu, Oct 6, 2011 at 5:55 PM, Randell Jesup <randell-ietf@jesup.org>wrote:

1) It allows the user to accidentally make contradictory/invalid
>> selections. For example, a user may indicate to the app that they want
>> to start a voice+video call, then when prompted by the browser
>> mistakenly select the "just video" option. What should the app do at
>> this point? Display an error message and force the user to start over
>> again? Start a call with just the video track and leave the user to
>> wonder why the other party can't hear them?
>>
>
> That's a question, though it depends on the application - if the
> application knows that the user will be asked to choose audio, video or
> both, it has far less reason to have the user indicate that before calling
> getUserMedia().  I.e. this perforce becomes part of the UI of the
> application, because the app can't avoid it.


I'm not sure this works in practice. The user doesn't know that the dialog
is coming, so someone who wants to initiate a voice-only call may be
disinclined to click a generic "start" button out of fear that they will
begin transmitting video. They'd feel a lot more comfortable if the button
said "start voice". On top of that, if there is an "always allow" option in
the dialog and the user has previously chosen to always allow both audio and
video, then there is subsequently no way for the user to start a voice-only
call unless the app itself presents that option (that said, I hope there
isn't an "always allow" option, for reasons noted below).



>  2) It requires the user to understand the difference between granting
>> permissions and enabling tracks. Let's say that a user wants to start a
>> voice call with the option of enabling video after the call starts. As I
>> understand it, the app should request audio+video access and then
>> disable the video track, re-enabling it when the user elects to start
>> video. But the user, upon seeing a request for audio+video permissions
>> despite the fact that they hit the "start voice" button in the app, may
>> choose "just audio", unwittingly denying themselves the ability to
>> enable video later.
>>
>
> I have an issue here also - adding video is likely to be a common
> operation, but when you start the call you won't know if you want to permit
> access to the camera later (especially if you can't trust the app, and our
> threat model says we don't).  Adding video (or adding audio, or a second
> video or audio channel) after a call starts without video should invoke a
> user prompt (which also can allow camera selection, etc).  Simply the need
> to (possibly) select cameras and/or mics when adding streams implies a user
> prompt, not just for security concerns.  I don't have a proposed API
> solution, but it shouldn't be hard.


I disagree that adding video implies a user prompt for non-security reasons.
The vast majority of the time, either the default camera is the "correct"
option or the browser will be able to choose the correct camera based on a
hint from the app. And I actually worry that prompting the user twice will
decrease security, since it will increase pressure on browsers to add an
"always allow" option and on users to select it. That scares the heck out of
me, since it would allow apps to subsequently surreptitiously capture and
transmit audio and video (or worse, allow others to do so via XSS attacks),
and I have a hard time imagining a more serious vulnerability.

I also think it's desirable to associate the video track with the stream up
front so the PeerConnection can negotiate audio and video at the same time.
That way, when the user enables video, it can begin transmitting immediately
instead of having to perform a whole new media negotiation.


 3) The dialog presented to the user when an app requests audio+video
>> will most likely have two checkboxes (audio and video) and two buttons
>> (allow and cancel). If the app can specify that both tracks are
>> required, the dialog can get away with just the buttons, which makes for
>> a much more pleasant user experience.
>>
>> As an app developer, I'd like to be able to specify one of three options
>> for each media kind: REQUIRED, REQUIRED_IF_CAPABILITY_EXISTS, and
>> OPTIONAL. When implementing a voice+video calling app, I'd pass REQUIRED
>> for audio and REQUIRED_IF_CAPABILITY_EXISTS for video, which would make
>> video required iff the device has a camera available. That way, if the
>> call to getUserMedia() succeeds, I know that I have the "right" tracks
>> given the user's intent and device capabilities (as an aside, detecting
>> failure would be a lot easier with the callback api).
>>
>
> In many cases I'd be leery of a REQUIRED style - what if the people using
> it are deaf - you're going to require them to use audio?  What if they're
> blind; you'll require them to use video?

Now in other cases it might be legitimately required, but it needs to be
> thought about.  Also, the UI can be optimized for the common/suggested case,
> and don't forget my comment above - this effectively is part of the app's
> UI, so you need to think of it that way, not as an interstitial popup.
>
> That also implies that while for security concerns it can't be under full
> control or styling of the application, it may make sense to give the
> application more control over what choices are presented to the user and how
> they're presented, in order to allow this to fit into the app UI better.
>  The downside of that is too much divergence from the "standard" way to
> indicate these could confuse the user, and we can't allow the application to
> provide the text shown.  (No "answer audio-only" button that causes video to
> be allowed.)  But we could provide a small set of "standard" options for the
> application to request be shown to the user to choose from, per your
> comments above (though not necessarily in that exact manner/API).


As noted above, I think that good UI design dictates that the app put a
descriptive label on the button that brings up the dialog. Given that,
making the user choose the same thing all over again in a dialog still
strikes me as redundant and confusing. The browser can and should display a
message describing what permissions are being requested and allow the user
to allow/deny them, but I worry that giving any more options than allow/deny
will result in a user experience like the Flash capture permissions dialog,
which is so confusing that it's common practice for apps to display a
screenshot with arrows pointing to what the user needs to click on.

Of course, this leaves open the question of how to specify which camera/mic
to use, so here's what I would like to see as an app developer: 1) The
LocalMediaStream returned by getUserMedia() is always associated with the
default camera and mic (or the ones indicated by the hints). 2) After
associating the stream with a video element that has the 'controls'
attribute set to true, the video element displays ui for changing the
capture devices. For example, on a mobile device with front and rear
cameras, the video element associated with a local stream could overlay a
"camera toggle" button. As an added benefit, this would provide a mechanism
to change the capture device associated with an existing track without
further complicating the stream/track apis.

Ben
Received on Friday, 7 October 2011 15:33:38 UTC