Re: Scenarios doc updated from Stefan Hakansson LK on 2012-01-16 (public-media-capture@w3.org from January 2012)

From: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>
Date: Mon, 16 Jan 2012 12:52:15 +0100
To: public-media-capture@w3.org
Message-ID: <4F140F6F.4040608@ericsson.com>
On 01/13/2012 11:42 PM, Travis Leithead wrote:
> Just pushed an update to the MediaStream Capture Scenarios document:
> http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html
>
>  The update incorporates various bits of feedback that were posted to
> this list last year.

Thanks for the update!

>
> Please review the 6 scenarios. They cover most of the use cases that
> I've envisioned for media capture. We can add variations on the
> existing scenarios or completely new scenarios to cover other
> use-cases that this TF feels are important for the final spec.

I've done a quick scan through, and these are my finding (more or less 
in the order they appear when reading).

On scenarios:
=============
* It is not stated what should happen when the browser tab that has been 
allowed to capture is not in focus. I bring this up since one of the 
Speech JavaScript API Specifications 
(http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html#security) 
proposed that capture should stop in such cases, while that kind of 
behavior is not at all what you would want in e.g. a conferencing 
scenario (where you would often like to use another browser tab to check 
out things while being in the conference)

* There is no scenario that uses the screen of the device as input. To 
give some background, screen sharing is a use case for webrtc (use case 
4.7.2 in 
http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-requirements/?include_text=1). 
Some experimentation has been carried out, using getUserMedia, and it 
was found being a viable way forward (using "screen" as hint). I don't 
know how this should be handled vis a vi the Media Capture TF, but it 
seems to me that it should be included. What do others think?

* There is no scenario with several cam's used in parallel, in section 
4.2.9 of 
http://datatracker.ietf.org/doc/draft-ietf-rtcweb-use-cases-and-requirements/?include_text=1 
two cameras are used in parallel for part of the session. Perhaps this 
is too constrained, but I think something where more than one is used 
should be in the doc.

* Apart from the above the scenarios cover all aspects I can think of.

* Other comments:
=================
* Section 4: It is stated that a MediaStream "can be conceptually 
understood as a tube or conduit between a source (the stream's 
generator) and a destination (the sink).". I think that based both on 
the scenarios in this document and earlier discussions a MediaStream can 
have several sinks (just think of the self-view in a conferencing scenario).

* Section 5.1.2. "Issues": my thinking was that the UA would _not_ be 
able to find out if cameras or microphones are available without 
involving user consent; I thought that would enable finger printing.

* Section 5.3.1. About audio preview, I don't really understand all of 
that. What we (in our prototyping) have been using for pre-view is a 
video element which we set up as "muted". Would that not be the normal 
way to do it? And of course you should be able to add a meter of some 
kind indicating the levels you get from the mic (to be able to adjust).

* Again, 5.3.1., I don't understand why you would limit to display one 
version of the captured video. Sure, that is the most natural way, but 
should we not let that be up to the application?

* 5.3.1: I tend to think this is out of scope for the TF (just as is 
said for Pre-processing). There are already way to do pre-view in the 
toolbox.

* 5.5, 5.6: I think a lot of the tools under 5.6.1 are actually usable 
also for pre-processing. And note that the Audio WG now has _three_ 
documents in the making, in addition to the "Web Audio API" there is now 
the "Audio Processing API" and the "MediaStream Processing API"! I have 
no clue which will prevail, but for the sake of completeness perhaps all 
should be listed?

* 5.7, 5.8: Here we have some challenges! Re. 5.7, it may be difficult 
to select the right devices without involving the user. The app may ask 
for a specific video, and the user must select the camera that makes 
most sense.

* 5.9: I think we should allow for several active capturing devices in 
parallel.

Thanks,

Stefan

>
> Thanks, -Travis
>
>
>
Received on Monday, 16 January 2012 11:52:59 UTC