Re: Distinguishing video and audio streams to enable fine-grained feature control from Rich Tibbett on 2012-08-14 (public-media-capture@w3.org from August 2012)

From: Rich Tibbett <richt@opera.com>
Date: Tue, 14 Aug 2012 23:41:09 +0200
To: Harald Alvestrand <harald@alvestrand.no>
Cc: "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-Id: <C366DE63-CEC8-4BF1-8486-E02FA3231D1A@opera.com>
On Aug 14, 2012, at 10:56 PM, Harald Alvestrand <harald@alvestrand.no> wrote:

> This seems like a good start on manipulating a camera, but I'm worried about calling this a VideoStreamTrack.
> 
> I suspect there will be many VideoStreamTracks that do not originate in cameras (of course my first favourite is the one that comes in through a PeerConnection, but other examples are VideoStreamTracks that are fed from a file or synthesized from a Canvas. We don't have those APIs yet, but I'm pretty sure we will.

I'm certainly not attached to the name but having a consistent object type from MediaStream.videoTracks would be a good idea. We added a 'locked' attribute is the cases where manipulation is not possible. Each feature also has its own isSupported method too e.g. zoomSupported. We'd have to come up with reasonable defaults in the cases that particular features are not supported but that seems doable.

> 
> In the proposal from MS we're currently debating over in WEBRTC, the concept of "decorating" a MediaStream occurs. I'm not sure how such decorating really works (ie can you use both the original MediaStream object and the "decorated" object after decorating it, or does the old MediaStream object "disappear" somehow, and how do we carry this through Webkit and friends?), but if that's a viable approach, I'd like to think of a lot of the functionality Rich is proposing as a "camera decorator", rather than a "video decorator".

I think that's a good summary. It's mostly about camera features but there may be other features that could be applicable to video stream tracks obtained from non-camera sources. For example we may want to 'bind' WebGL filters directly to a video stream track which, if you then happen to stream that object P2P, are applied on the sender side and streamed to the receiver as part of that video stream (think, watermarks).

> 
> Another aspect of the proposal is that it seems to add switching between cameras - how does this interact with the permissions UI model, where the user thinks he knows which camera(s) he gave a particular page the right to access?

Great point, yes,  and camera selection does have implications on the way we present the initial user opt-in getUserMedia UI. In most respects it will simplify that UI.

We're considering a UI where the user gives a page permission to access to 'the webcam' as a general concept. The opt-in would by default result in a single video stream track and single audio stream track in most cases but there may be other options.

A developer can then change the source of the video track via this API but still only has a single stream at any given time.

The use case Is when a user supplies the page with one camera and the user then, purposefully, changes their mind and wants to switch to another web cam. It's proven difficult for pages to have to explain to users how to do that via the browser UI (particularly in the case where different browsers have different UIs that need to be explained to users to switch cameras). We'd therefore like to make camera switching a function of web apps themselves.

> 
> Note - the CLUE WG in the IETF (stuff for immersive telepresence) has decided that they want to be able to represent a camera's position and direction in detail in a room's coordinate system (X, Y, Z, horizontal and vertical angle, horizontal and vertical field-of-view). We often won't have that information, but if it is available, there should be a single way to expose it.

Sounds interesting. A similar topic came up early on in the WHATWG efforts [1]. Essentially, we should take a look at the DeviceOrientation API and see if that helps us here, whether we could then send that data out-of-band to a remote peer or whether we could repurpose that API to hang off VideoStreamTrack either in a push/event based model or in a pull model. In reality though, camera orientation (as opposed to standard device orientation) is going to be hard to report correctly considering that this information is not something that a lot of devices or cameras currently provide via their APIs. I'd prefer to defer this from the initial discussion if possible pending support in underlying OSs.

Many thanks,

Rich

[1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-March/030944.html
Received on Tuesday, 14 August 2012 21:41:42 UTC