RE: Scenarios doc updated from Travis Leithead on 2012-01-16 (public-media-capture@w3.org from January 2012)

From: Travis Leithead <travis.leithead@microsoft.com>
Date: Mon, 16 Jan 2012 20:08:57 +0000
To: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <9768D477C67135458BF978A45BCF9B38381EE737@TK5EX14MBXW604.wingroup.windeploy.ntde>
Great feedback. 

A few thoughts below. I'll get to work on incorporating this feedback today or tomorrow.

-Travis

>-----Original Message-----
>From: Stefan Hakansson LK [mailto:stefan.lk.hakansson@ericsson.com]
>
>On scenarios:
>=============
>* It is not stated what should happen when the browser tab that has been
>allowed to capture is not in focus. I bring this up since one of the
>Speech JavaScript API Specifications
>(http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-
>1696/speechapi.html#security)
>proposed that capture should stop in such cases, while that kind of
>behavior is not at all what you would want in e.g. a conferencing
>scenario (where you would often like to use another browser tab to check
>out things while being in the conference)

This indeed is an interesting scenario. I would love to hear other's thoughts on this. We'll walk a fine line between user privacy (e.g., seems like a bad idea to just leave the user's camera on when they switch away from an active browser tab), and usability (e.g., in conferencing scenarios). Perhaps the use of a PeerConnection can trigger some state change in implementations such that they persist after switching tabs?



>* There is no scenario that uses the screen of the device as input. To
>give some background, screen sharing is a use case for webrtc (use case
>4.7.2 in
>http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-
>requirements/?include_text=1).
>Some experimentation has been carried out, using getUserMedia, and it
>was found being a viable way forward (using "screen" as hint). I don't
>know how this should be handled vis a vi the Media Capture TF, but it
>seems to me that it should be included. What do others think?

Good scenario, I'll see if I can incorporate it into an existing scenario as a variation; if not, I'll spin up another scenario for it.



>* There is no scenario with several cam's used in parallel, in section
>4.2.9 of
>http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-
>requirements/?include_text=1
>two cameras are used in parallel for part of the session. Perhaps this
>is too constrained, but I think something where more than one is used
>should be in the doc.

Scenario 2.4 (Video diary at the Coliseum) uses two webcams in parallel, but only records from one of them at a time. How would you suggest that be changed?


>* Other comments:
>=================
>* Section 5.3.1. About audio preview, I don't really understand all of
>that. What we (in our prototyping) have been using for pre-view is a
>video element which we set up as "muted". Would that not be the normal
>way to do it? And of course you should be able to add a meter of some
>kind indicating the levels you get from the mic (to be able to adjust).

I'm calling out the problem of previewing your own local audio input (not hearing 
audio from a PeerConnection or other source). I suspect (but have not confirmed), 
that this can be a major problem. First off,
none of the six scenarios require this capability
The concern I see is that in most simple use cases, the developer will request
both audio and video capabilities from getUserMedia. Then they'll attach
that MediaStream to a video element. Let's say the device is a laptop. The 
laptop's array microphone will be enabled. When the user speaks, the array-mic 
will send the signal through to the PC's speakers which will amplify and blast that 
sound back to the user, some of which will get picked up by the array-mic again and
be sent out the speakers... I'm somewhat familiar with this based on some amateur work
I've done in live sound performances.

I could be making this into a bigger problem that it is however. Implementor feedback 
testing on a variety of devices would be helpful.



>* Again, 5.3.1., I don't understand why you would limit to display one
>version of the captured video. Sure, that is the most natural way, but
>should we not let that be up to the application?

I tend to agree. This is more of an opportunity for an implementation to optimize if desired, not something necessarily for the spec to mandate.


>* 5.3.1: I tend to think this is out of scope for the TF (just as is
>said for Pre-processing). There are already way to do pre-view in the
>toolbox.

Which point is this referring to specifically?


>* 5.5, 5.6: I think a lot of the tools under 5.6.1 are actually usable
>also for pre-processing. And note that the Audio WG now has _three_
>documents in the making, in addition to the "Web Audio API" there is now
>the "Audio Processing API" and the "MediaStream Processing API"! I have
>no clue which will prevail, but for the sake of completeness perhaps all
>should be listed?

I'll note this, and see about linking to those other specs.


>* 5.7, 5.8: Here we have some challenges! Re. 5.7, it may be difficult
>to select the right devices without involving the user. The app may ask
>for a specific video, and the user must select the camera that makes
>most sense.
>
>* 5.9: I think we should allow for several active capturing devices in
>parallel.

Sure; it just may not be physically possible in some devices. As long as 
we define appropriate error conditions then it's fine.
Received on Monday, 16 January 2012 20:09:30 UTC