Re: Scenarios doc updated from Randell Jesup on 2012-01-16 (public-media-capture@w3.org from January 2012)

From: Randell Jesup <randell-ietf@jesup.org>
Date: Mon, 16 Jan 2012 15:42:03 -0500
To: public-media-capture@w3.org
Message-ID: <4F148B9B.8010009@jesup.org>
On 1/16/2012 3:08 PM, Travis Leithead wrote:
> Great feedback.
>
> A few thoughts below. I'll get to work on incorporating this feedback today or tomorrow.
>
> -Travis
>
>> -----Original Message-----
>> From: Stefan Hakansson LK [mailto:stefan.lk.hakansson@ericsson.com]
>>
>> On scenarios:
>> =============
>> * It is not stated what should happen when the browser tab that has been
>> allowed to capture is not in focus. I bring this up since one of the
>> Speech JavaScript API Specifications
>> (http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-
>> 1696/speechapi.html#security)
>> proposed that capture should stop in such cases, while that kind of
>> behavior is not at all what you would want in e.g. a conferencing
>> scenario (where you would often like to use another browser tab to check
>> out things while being in the conference)
> This indeed is an interesting scenario. I would love to hear other's thoughts on this. We'll walk a fine line between user privacy (e.g., seems like a bad idea to just leave the user's camera on when they switch away from an active browser tab), and usability (e.g., in conferencing scenarios). Perhaps the use of a PeerConnection can trigger some state change in implementations such that they persist after switching tabs?

For WebRTC, we've been assuming that our UI must show camera/mic 
activity separate from the selected tab (and in any case we can't count 
on the JS app UI to show activity for us, given the threat model).  So 
we should have the ability to show activity if you navigate away, though 
users may forget they have an open camera/mic - there are definite 
user-interaction issues to consider here (doorhanger prompts, etc) that 
can help.

Having the ability, and what the default behavior is when you switch 
away are two different things.  (And if the JS app can change the 
default behavior.)

>> * There is no scenario that uses the screen of the device as input. To
>> give some background, screen sharing is a use case for webrtc (use case
>> 4.7.2 in
>> http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-
>> requirements/?include_text=1).
>> Some experimentation has been carried out, using getUserMedia, and it
>> was found being a viable way forward (using "screen" as hint). I don't
>> know how this should be handled vis a vi the Media Capture TF, but it
>> seems to me that it should be included. What do others think?
> Good scenario, I'll see if I can incorporate it into an existing scenario as a variation; if not, I'll spin up another scenario for it.

GetUserMedia should consider the option of selecting a camera (and which 
one or ones), a file in place of a camera, (perhaps) an image, or the 
screen/window/tab.  Which options are offered (in particular screen, but 
maybe file/image) can be hinted.  I'm not sure I'd want to mandate that 
screen (or window or tab) be available, but I do want to know how an 
implementation would select those.

>> * There is no scenario with several cam's used in parallel, in section
>> 4.2.9 of
>> http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-
>> requirements/?include_text=1
>> two cameras are used in parallel for part of the session. Perhaps this
>> is too constrained, but I think something where more than one is used
>> should be in the doc.
> Scenario 2.4 (Video diary at the Coliseum) uses two webcams in parallel, but only records from one of them at a time. How would you suggest that be changed?

Well, WebRTC supports multiple video and audio streams incoming *and* 
outgoing, so it will be needed for that (the 'hockey' example).  For 
media capture, this would apply to the video diary as well - recording 
both the view seen, and the face of the commentator for later use 
insetting or editing.  Or someone recording a sports event from the 
sideline using two webcams, or two people at the same computer each with 
a webcam pointed at them (one built in, one on a cable to the coworker 
sitting next to or across from them).  Etc.

>> * Other comments:
>> =================
>> * Section 5.3.1. About audio preview, I don't really understand all of
>> that. What we (in our prototyping) have been using for pre-view is a
>> video element which we set up as "muted". Would that not be the normal
>> way to do it? And of course you should be able to add a meter of some
>> kind indicating the levels you get from the mic (to be able to adjust).
> I'm calling out the problem of previewing your own local audio input (not hearing
> audio from a PeerConnection or other source). I suspect (but have not confirmed),
> that this can be a major problem. First off,
> none of the six scenarios require this capability
> The concern I see is that in most simple use cases, the developer will request
> both audio and video capabilities from getUserMedia. Then they'll attach
> that MediaStream to a video element. Let's say the device is a laptop. The
> laptop's array microphone will be enabled. When the user speaks, the array-mic
> will send the signal through to the PC's speakers which will amplify and blast that
> sound back to the user, some of which will get picked up by the array-mic again and
> be sent out the speakers... I'm somewhat familiar with this based on some amateur work
> I've done in live sound performances.

a) echo-cancellation should work, but
b) for audio preview, a typical way would be to play it back delayed 
long enough to not be confusing, or use a two-state preview -- click to 
record, click again to play it back (or off a n-second timer).  That 
assumes VU meter isn't good enough for audio.

-- 
Randell Jesup
randell-ietf@jesup.org
Received on Monday, 16 January 2012 20:43:26 UTC