W3C home > Mailing lists > Public > public-media-capture@w3.org > February 2012

RE: Scenarios doc updated

From: Travis Leithead <travis.leithead@microsoft.com>
Date: Thu, 16 Feb 2012 19:34:23 +0000
To: Josh Soref <jsoref@rim.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <9768D477C67135458BF978A45BCF9B383821E233@TK5EX14MBXW603.wingroup.windeploy.ntdev.microsoft.com>
Thanks Josh. I'll make some of the easy edits in the next few days. (The rest may take longer...)

-----Original Message-----
From: Josh Soref [mailto:jsoref@rim.com] 
Sent: Tuesday, February 14, 2012 3:27 PM
To: public-media-capture@w3.org
Subject: RE: Scenarios doc updated

Travis wrote:
> Just pushed an update to the MediaStream Capture Scenarios document:
> http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.

> html

Could I ask you to do the following:
* add an index (with links) for each of the scenarios.

> She clicks a "select photo" drop-down widget on the site, and choses the "from webcam" option.


> 7. Persisting the capture while in a background tab

s/Persisting the/continuing to/

> Video diary at the Coliseum (multiple webcams and error handling)

Google has the following to say:
| Showing results for roman coliseum
| Search instead for Coliseum


> 6. Use of battery status to automatically manage video and audio 
> capture

I suspect that technically this is out of scope and involves a different DAP deliverable.

> While still on his Italy vacation,

s/Italy/Italian/ :) -- or vacation in Italy...

> Albert hears that the Pope might make a public appearance at the vatican.


> He activates both front and rear cameras so that he can capture both 
> himself and the camera's view.

"camera" here isn't fair, "his view" perhaps?

> Albert excitely describes the sense of the crowd around him while 
> simultaneously capturing the Pope's appearance.


> minutes and also saving the associated meeting video for later 
> archiving.

Probably "archival" or "review from the archives".

> After the five other field agents checkin

s/checkin/check in/

> (screen as an local media input source)


> 1. Video capture from local screen/display

Most sharing tools allow one to specify windows instead of just taking the whole screen....

I'd also like to add a "2. Provide a video file instead of an actual live capture". In case the developer has already recorded it previously and is unlucky enough to have it currently working on his computer (this is the usual case...).

> MediaStream vs "media stream" or "stream"

Once you use "mediastream" :(

Can you add whitespace before/after the definitions of MediaStream and MediaStream format?

> 1) another application already using the webcam must yield it up

"yield it" or "give it up"

> A web application must be able to initiate a request for access to the 
> user's webcam
> (s) and/or microphone(s).

I wish MSIE9@W7x64 didn't consider the (s) to be a good word wrapping point when compared with the readily available space before it ;-).

> it is not recommended that the browser automatically shut down capture 
> devices when the capturing browser tab is sent to the background.

It might not be recommended to automatically shut it down, but it wouldn't be unreasonable to recommend that the web browser remind the user that the camera/microphone is still actively sending content to that tab.

A tiny preview at a corner of the window for 5s with a pin button could be great....

> Specific information about a given webcam and/or microphone must not 
> be available until after the user has granted consent.

> Otherwise "drive-by" fingerprinting of a UA's devices and 
> characteristics can be obtained without the user's knowledge— a 
> privacy issue.

My general preference is to restrict specific information even after granting permission.

In my view, there are three states:
1. No access
2. General access
3. Advanced access

For General access, there's a prompt of sorts.
For Advanced access, I'd lean towards a way for the user to poke the chrome and say ~let 'er rip~.

Most applications shouldn't need such advanced details, so we shouldn't default to exposing them even when we do provide access to an input source.

> ◦If the user doesn’t have a webcam/mic, and the developer requests it, 
> a UA would be expected to invoke the error callback immediately.

I’d rather the UA be able to let the User select a canned video file. This enables all devices to participate in Video instead of just the ones which have genuine cameras.

> ◦Depending on the timing of the invocation of the error callback, 
> scripts can still profile whether the UA does or does not have a given 
> device capability.

We should design so that the response isn't anywhere close to instant. If the user has to stop another application which is using the Camera in order to grant access to a requesting application, that time should be taken into account. Similarly, if a user wants to use a pre-existing recording, the user will need to find it and then select it.

If the model is that the user is always able to select <live stream> | <canned stream> | <refuse stream>, then all users always have a choice, and there's no reason / way for an app to distinguish that choice from <canned stream> | <refused stream>.

> 2. In the case of a user with multiple video and/or audio capture 
> devices, what specific permission is expected to be granted for the 
> "video" and "audio" options presented to getUserMedia?

> For example, does "video" permission mean that the user grants 
> permission to any and all video capture devices?

I'd expect that a request for video would cause the UA to show the user previews from all available sources and then let the user select sources to share to the App. Only the one(s) the user select(s) are granted. If the App only asks for one source, and the user selects two, then the UA will let the user pick which one to use initially and should allow the user to change (w/o relying on the App) the source at a later time.

If the App requests a second source and the user has granted two, then the UA can show the user which source is currently active for the first stream and let the user decide which to send to each stream (the current, and the new). Again letting the user make a selection at this point instead of instantly granting it avoids profiling and also allows for the user to substitute a canned stream.

> Given the privacy point above, my recommendation is that "video" 
> permission represents permission to all possible video capture devices 
> present on the user's device,

Assuming we get Intents() working or the user has DLNA (or similar), it isn't impossible for a video camera to not be within arm's length of the user.

> therefore enabling switching scenarios (among video devices) to be 
> possible without re-acquiring user consent.

I think this can be done as I've described above without requiring additional "consent" but still allowing the user control.

If I have more than 2 cameras, it doesn't mean that I really want the web app to have access to all of them, maybe I'm reserving one for my Baby-Monitor app (which only takes pictures once every minute).

> 3. When a user has only one of two requested device capabilities (for 
> example only "audio" but not "video", and both "audio" and "video" are 
> requested), should access be granted without the video or should the 
> request fail?

We partially address this by using Hints, but we can make it better by letting a UA mux in the missing element from an alternate source. So if an application says that audio+video are mandatory and the user only has a mic, then the UA should be able to mix in Video from a canned source (or a canned still).

I expect to be able to use a Video Conferencing app that "demands" audio+video by providing a canned-still of me so that I can wear my pajamas instead of putting on my dress suit.

> 4. Enabling control configuration of webcam based on age (parental 
> control)

If a UA could support a policy that it wouldn't send real video but would only send video from a set of canned videos -- but would allow audio, that could address part of this.

> In such a scenario it should be reasonably simple for the application 
> to be notified of the situation, and for the application to re-request 
> access to the stream.

My general preference is for the UA to offer choices before it says anything to the App.

> 2. What's the expected interaction model with regard to user-consent?
> For example, if the re-initialization request is for the same 
> device(s), will the user be prompted for consent again?
> Minor glitches in the stream source connection should not revoke the user-consent.

One concern I have is the "long running app". Ignoring the video use case, there are devices which have a tutorial that you see when you buy them and use them for the first time. If you're not the first person to use the device, then you've missed the tutorial and have no idea what it does.

It's pretty easy for one person to go in and trigger a video chat app, then turn off video, another person walks in, they talk (using the chat) for a while, and then the first person walks out. -- This should probably be captured in a scenario.

> 3. How can tug-of-war scenarios be avoided between two web 
> applications both attempting to gain access to a non-shared device at 
> the same time?

> Should the API support the ability to request exclusive use of the device?


If a second app asks for access, the UA should be responsible for indicating to the User which app is currently using the resource and letting the user decide on a policy. If the user wants to move the source, the user should be able to do something "nice" for the other consumer, possibly sending the last frame as frozen until the user wants to return the source to the first app. -- There are examples of how this works involving news telecasts where there's a poor uplink from one reporter and another reporter starts talking.

> The application should be able to affect changes to the media capture 
> device(s) settings via the media stream

> and view those changes happen in the preview.

This last line doesn't make sense. -- i.e. I don't understand what it's trying to say.

> 1. Audio tag preview is somewhat problematic because of the acoustic 
> feedback problem (interference that can result from a loop between a 
> microphone input that picks up the output from a nearby speaker).

Is it reasonable to ask for other clients to provide some sort of visual preview instead? Can normal people recognize noise if it's rendered as an image?

> 1. Is there a scenario where end-users will want to stop just a single 
> device, rather than all devices participating in the current media 
> stream?

Certainly. If I have a video + audio stream, I could easily decide that I need to drop audio to listen to a second conversation but I might not mind leaving the video open to let people know that I'm still alive and when my interruption is over.

If I have two cameras, I might want to temporarily turn off one or the other for privacy reasons. Maybe the phone is positioned to record something and I want to do a wardrobe change while it's streaming the other. 

> the user just wants to drop down to audio because they decide they don't need video.

I have a tendency to start a conference call at one location, then go mobile, and then resume. While I'm mobile, I'm typically on cellular and would want to be able to drop video, but when I return to wifi, I'd like to be able to resume video.

> (see next section).

s/next/the next/

> 1. Connecting the media stream to a sink (such as the video or audio 
> elements

You're missing a )

> 2. HTML5 canvas element and the Canvas 2D context.
> The canvas element employs a fairly extensive 2D drawing API and will 
> soon be extended with audio capabilities as well (RichT, can you 
> provide a link?).

Does RichT know he was called out here? :)

--- on device naming.
I've run into systems where the device names are actually random at the OS level (my favorite is that my Phone when attached to a Mac would get a new device number for each attachment instance).

I think that a user can typically identify which camera is which by seeing a preview of the live frame. If I'm unsure which is which, I'll wave my hand or shake my fist at each camera until I identify the one I need to poke. The same applies with mics (see "who is making noise" and "testing, 1 .. 2 .. 3").

> any changes to the device's manipulatable state should by isolated


> Additionally, script code can be written to change device 
> characteristics without careful error-detection

There should be a warning to UAs to look out for cases where there appears to be a fight within a single "application" between multiple settings. Sure someone might be doing HDR, but it's probably a bug (just as we have infinite loop or slow script detection).

> to the application requesting the change

Note that here "application" = "web page", not UA (in the case that it hosts multiple "web pages" that use the device) -- this should probably be clarified.

> devices with manipulatable state


> Depending on convenience and scenario usefullness


> It may be desireable


---- sorry for the delay.

This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.
Received on Thursday, 16 February 2012 19:34:59 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:26:08 UTC