Re: Scenarios doc updated from Stefan Hakansson LK on 2012-01-17 (public-media-capture@w3.org from January 2012)

From: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>
Date: Tue, 17 Jan 2012 09:15:14 +0100
To: Travis Leithead <travis.leithead@microsoft.com>
CC: "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <4F152E12.1080906@ericsson.com>
On 01/16/2012 09:08 PM, Travis Leithead wrote:
> Great feedback.
>
> A few thoughts below. I'll get to work on incorporating this feedback
> today or tomorrow.
>
> -Travis
>
>> -----Original Message----- From: Stefan Hakansson LK
>> [mailto:stefan.lk.hakansson@ericsson.com]
>>
>> On scenarios: ============= * It is not stated what should happen
>> when the browser tab that has been allowed to capture is not in
>> focus. I bring this up since one of the Speech JavaScript API
>> Specifications
>> (http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html#security)
>> proposed that capture should stop in such cases, while that kind
>> of behavior is not at all what you would want in e.g. a
>> conferencing scenario (where you would often like to use another
>> browser tab to check out things while being in the conference)
>
> This indeed is an interesting scenario. I would love to hear other's
> thoughts on this. We'll walk a fine line between user privacy (e.g.,
> seems like a bad idea to just leave the user's camera on when they
> switch away from an active browser tab), and usability (e.g., in
> conferencing scenarios). Perhaps the use of a PeerConnection can
> trigger some state change in implementations such that they persist
> after switching tabs?

I think Randell provided good feedback here - I have little to add.

>
>
>
>> * There is no scenario that uses the screen of the device as input.
>> To give some background, screen sharing is a use case for webrtc
>> (use case 4.7.2 in
>> http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-
>> requirements/?include_text=1). Some experimentation has been
>> carried out, using getUserMedia, and it was found being a viable
>> way forward (using "screen" as hint). I don't know how this should
>> be handled vis a vi the Media Capture TF, but it seems to me that
>> it should be included. What do others think?
>
> Good scenario, I'll see if I can incorporate it into an existing
> scenario as a variation; if not, I'll spin up another scenario for
> it.

Dito.

>
>
>
>> * There is no scenario with several cam's used in parallel, in
>> section 4.2.9 of
>> http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-
>> requirements/?include_text=1 two cameras are used in parallel for
>> part of the session. Perhaps this is too constrained, but I think
>> something where more than one is used should be in the doc.
>
> Scenario 2.4 (Video diary at the Coliseum) uses two webcams in
> parallel, but only records from one of them at a time. How would you
> suggest that be changed?

Would it not be possible to use both at the same time while recording, 
capturing a video of both himself and the Coliseum? When 
"playing/viewing" the diary, the layout could be such that Albert's head 
is overlayed as a small video in the corner of the main (showing 
Coliseum) for parts of the sequence.

Or incorporate the "hockey" use case. (Randell had further suggestions).

>
>
>> * Other comments: ================= * Section 5.3.1. About audio
>> preview, I don't really understand all of that. What we (in our
>> prototyping) have been using for pre-view is a video element which
>> we set up as "muted". Would that not be the normal way to do it?
>> And of course you should be able to add a meter of some kind
>> indicating the levels you get from the mic (to be able to adjust).
>
> I'm calling out the problem of previewing your own local audio input
> (not hearing audio from a PeerConnection or other source). I suspect
> (but have not confirmed), that this can be a major problem. First
> off, none of the six scenarios require this capability The concern I
> see is that in most simple use cases, the developer will request both
> audio and video capabilities from getUserMedia. Then they'll attach
> that MediaStream to a video element. Let's say the device is a
> laptop. The laptop's array microphone will be enabled. When the user
> speaks, the array-mic will send the signal through to the PC's
> speakers which will amplify and blast that sound back to the user,
> some of which will get picked up by the array-mic again and be sent
> out the speakers... I'm somewhat familiar with this based on some
> amateur work I've done in live sound performances.

This is indeed a problem; but I saw it like this: first of all we could 
have code examples in the spec that mute the audio for the self view 
(with some comment on why), and secondly, would not the application 
developer detect this problem when doing the very first basic test? And 
then fix it long before the app is used by anyone else.

>
> I could be making this into a bigger problem that it is however.
> Implementor feedback testing on a variety of devices would be
> helpful.
>
>
>
>> * Again, 5.3.1., I don't understand why you would limit to display
>> one version of the captured video. Sure, that is the most natural
>> way, but should we not let that be up to the application?
>
> I tend to agree. This is more of an opportunity for an implementation
> to optimize if desired, not something necessarily for the spec to
> mandate.
>
>
>> * 5.3.1: I tend to think this is out of scope for the TF (just as
>> is said for Pre-processing). There are already way to do pre-view
>> in the toolbox.
>
> Which point is this referring to specifically?

Sometimes I'm very unclear :-(. What I meant was that perhaps the TF 
does not have to deal with pre-viewing at all since there are already 
tools available (audio/video/media elements among others) that can be used.

>
>
>> * 5.5, 5.6: I think a lot of the tools under 5.6.1 are actually
>> usable also for pre-processing. And note that the Audio WG now has
>> _three_ documents in the making, in addition to the "Web Audio API"
>> there is now the "Audio Processing API" and the "MediaStream
>> Processing API"! I have no clue which will prevail, but for the
>> sake of completeness perhaps all should be listed?
>
> I'll note this, and see about linking to those other specs.
>
>
>> * 5.7, 5.8: Here we have some challenges! Re. 5.7, it may be
>> difficult to select the right devices without involving the user.
>> The app may ask for a specific video, and the user must select the
>> camera that makes most sense.
>>
>> * 5.9: I think we should allow for several active capturing devices
>> in parallel.
>
> Sure; it just may not be physically possible in some devices. As long
> as we define appropriate error conditions then it's fine.
>
>
Received on Tuesday, 17 January 2012 08:15:51 UTC