Re: Distinguishing video and audio streams to enable fine-grained feature control from Harald Alvestrand on 2012-08-14 (public-media-capture@w3.org from August 2012)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Tue, 14 Aug 2012 22:56:19 +0200
To: public-media-capture@w3.org
Message-ID: <502ABB73.5090704@alvestrand.no>
This seems like a good start on manipulating a camera, but I'm worried 
about calling this a VideoStreamTrack.

I suspect there will be many VideoStreamTracks that do not originate in 
cameras (of course my first favourite is the one that comes in through a 
PeerConnection, but other examples are VideoStreamTracks that are fed 
from a file or synthesized from a Canvas. We don't have those APIs yet, 
but I'm pretty sure we will.

In the proposal from MS we're currently debating over in WEBRTC, the 
concept of "decorating" a MediaStream occurs. I'm not sure how such 
decorating really works (ie can you use both the original MediaStream 
object and the "decorated" object after decorating it, or does the old 
MediaStream object "disappear" somehow, and how do we carry this through 
Webkit and friends?), but if that's a viable approach, I'd like to think 
of a lot of the functionality Rich is proposing as a "camera decorator", 
rather than a "video decorator".

Another aspect of the proposal is that it seems to add switching between 
cameras - how does this interact with the permissions UI model, where 
the user thinks he knows which camera(s) he gave a particular page the 
right to access?

Note - the CLUE WG in the IETF (stuff for immersive telepresence) has 
decided that they want to be able to represent a camera's position and 
direction in detail in a room's coordinate system (X, Y, Z, horizontal 
and vertical angle, horizontal and vertical field-of-view). We often 
won't have that information, but if it is available, there should be a 
single way to expose it.

On 08/14/2012 01:38 PM, Rich Tibbett wrote:
> Hi,
>
> During the last few months we've received a number of requests from 
> developers to provide more granular control over camera streams 
> provided via getUserMedia. Such requests have centered around 
> providing auto-focus feature detection/setting/monitoring, zoom 
> detection/setting/monitoring, enabling camera flash in a sensible way, 
> and changing the rotation/orientation of a webcam stream. Because the 
> current specification does not incorporate these features at present 
> we did some brainstorming and came up with the following idea.
>
> Taking all of these use cases in to account, we would like to present 
> the following proposal that allows a web application to:
>
> 1. Distinguish between audio and video media streams obtained via 
> getUserMedia (by introducing VideoStreamTrack and AudioStreamTrack 
> interfaces that inherit from the currently specified - generic - 
> MediaStreamTrack interface).
>
> 2. Apply special behavior to VideoStreamTrack objects that allow a web 
> developer to feature detect on the capabilities of a web camera, set 
> their desired modes for a number of features related to a given Camera 
> object, trigger certain features on-demand and listen for Camera 
> feature start and end events where that makes sense.
>
> When we have the ability to distinguish video stream tracks from audio 
> stream tracks (in point 1 above) it became clear that there were many 
> other use cases that we could then naturally fit in to the same API 
> architecture.
>
> The proposal herein therefore incorporates an initial set of controls 
> for developers to tune certain web camera feature. We expect this will 
> be of interest for a large number of other use cases, such as Camera 
> capabilities detection and control in the form of a best-effort 
> approach that is inclusive and 'webby' by design.
>
> Our primary concern at this time is to get a sense for whether there's 
> agreement around supplying these options to developers or not and then 
> to subsequently discuss whether the interface proposal provided is fit 
> for purpose. If anyone else has alternative proposals then we'd be 
> interested to discuss them further here of course.
>
>
> *** PROPOSAL START ***
>
> // +++ VIDEOSTREAMTRACK
>
> interface VideoStreamTrack {
>
>   readonly attribute boolean locked; // true if object is not a
>                                      // camera object or manipulation
>                                      // is not otherwise supported
>                                      // false if object can be
>                                      // modified
>
>   // CAMERA SELECTION
>
>   readonly attribute boolean cameraChangeSupported; // true if camera
>                                                     // can be changed.
>                                                     // Otherwise, false.
>
>   readonly attribute unsigned short numberOfCameras;
>
>   readonly attribute unsigned short currentCamera;
>
>   CameraInfo     getCameraInfo(in unsigned short value);
>
>   void           setCamera(in unsigned short value);
>
>   // AUTO-FOCUS
>
>   readonly attribute boolean focusSupported; // true if camera focus
>                                              // can be changed.
>                                              // Otherwise, false.
>
>   readonly attribute DOMString currentFocusMode;
>
>   void           setFocusMode(in DOMString value);
>
>   // value argument above may be one of 'auto' (default),
>   // 'continuous', 'edof' (Extended Depth of Field), 'fixed',
>   // 'infinity' or 'macro' (close-up focus mode).
>
>   void           autoFocus(in optional Object xy);
>   void           cancelAutoFocus();
>
>   // ZOOM
>
>   readonly attribute boolean zoomSupported; // true if camera zoom can
>                                             // be changed. Otherwise,
>                                             // false.
>
>   readonly attribute unsigned short currentZoom;
>   readonly attribute unsigned short maxZoom;
>
>   void           startZoom(in unsigned short value);
>   void           stopZoom();
>
>   // FLASH
>
>   readonly attribute boolean flashSupported; // true if camera flash
>                                              // mode can be changed.
>                                              // Otherwise, false.
>
>   readonly attribute DOMString currentFlashMode;
>
>   void           setFlashMode(in DOMString value);
>
>   // value argument above may be one of 'auto' (default), 'off', 'on',
>   // 'red-eye' or 'torch'.
>
>   // ORIENTATION
>
>   readonly attribute boolean orientationSupported; // true if camera
>                                                    // orientation be
>                                                    // changed.
>                                                    // Otherwise, false
>
>   readonly attribute unsigned short currentDisplayOrientation;
>   // 0<->360 degrees
>
>   void           setDisplayOrientation(in unsigned short degrees);
>
>   // GENERAL FEATURES
>
>   void           takePicture();
>
>   // takePicture first applies currentFocusMode and currentFlashMode
>   // before taking a JPEG image snapshot from currentCamera and firing
>   // a new 'picture' event, setting its data attribute as the value of
>   // the JPEG image's ImageData.
>
>   // VIEWING ANGLE
>
>   readonly attribute float horizontalViewAngle;
>   readonly attribute float verticalViewAngle;
>
>   // EVENT LISTENERS
>
>            attribute EventListener  onfocusstart;
>            attribute EventListener  onfocusend;
>
>            attribute EventListener  onzoomstart;
>            attribute EventListener  onzoomend;
>
>            attribute EventListener  oncamerachange;
>
>            attribute EventListener  onpicture;
>
> }
>
> VideoStreamTrack implements MediaStreamTrack;
>
> VideoStreamTrack implements EventTarget;
>
> // VideoStreamTrack events: 'focusstart', 'focusend', 'zoomstart',
> // 'zoomend', 'camerachange' and 'picture'.
>
> ++ CAMERAINFO
>
> [NoInterfaceObject]
> interface CameraInfo {
>
>            attribute unsigned short facing;
>
>   // facing above will be either 'user' or 'environment'
>
>            attribute unsigned short orientation;
>
>   // orientation will be in the range 0 to 360 degrees
>
> }
>
> *** PROPOSAL END ***
>
>
> -- 
> Rich Tibbett (richt)
> CORE Platform Architect - Opera Software ASA
>
Received on Tuesday, 14 August 2012 20:56:49 UTC