Re: Distinguishing video and audio streams to enable fine-grained feature control

On 8/14/2012 5:41 PM, Rich Tibbett wrote:
> On Aug 14, 2012, at 10:56 PM, Harald Alvestrand <harald@alvestrand.no> wrote:
>
>> This seems like a good start on manipulating a camera, but I'm worried about calling this a VideoStreamTrack.
>>
>> I suspect there will be many VideoStreamTracks that do not originate in cameras (of course my first favourite is the one that comes in through a PeerConnection, but other examples are VideoStreamTracks that are fed from a file or synthesized from a Canvas. We don't have those APIs yet, but I'm pretty sure we will.
> I'm certainly not attached to the name but having a consistent object type from MediaStream.videoTracks would be a good idea. We added a 'locked' attribute is the cases where manipulation is not possible. Each feature also has its own isSupported method too e.g. zoomSupported. We'd have to come up with reasonable defaults in the cases that particular features are not supported but that seems doable.

Maybe this isn't the Way Things Are Done in HTML5/JS, but normally I'd 
think of defining a extension to a VideoStreamTrack that's a 
CameraStreamTrack.  You might have something similar but simpler for 
audio, with a MicrophoneTrack.  CameraStreamTrack would have all the 
camera control details.

>> In the proposal from MS we're currently debating over in WEBRTC, the concept of "decorating" a MediaStream occurs. I'm not sure how such decorating really works (ie can you use both the original MediaStream object and the "decorated" object after decorating it, or does the old MediaStream object "disappear" somehow, and how do we carry this through Webkit and friends?), but if that's a viable approach, I'd like to think of a lot of the functionality Rich is proposing as a "camera decorator", rather than a "video decorator".
> I think that's a good summary. It's mostly about camera features but there may be other features that could be applicable to video stream tracks obtained from non-camera sources. For example we may want to 'bind' WebGL filters directly to a video stream track which, if you then happen to stream that object P2P, are applied on the sender side and streamed to the receiver as part of that video stream (think, watermarks).
>
>> Another aspect of the proposal is that it seems to add switching between cameras - how does this interact with the permissions UI model, where the user thinks he knows which camera(s) he gave a particular page the right to access?
> Great point, yes,  and camera selection does have implications on the way we present the initial user opt-in getUserMedia UI. In most respects it will simplify that UI.
>
> We're considering a UI where the user gives a page permission to access to 'the webcam' as a general concept. The opt-in would by default result in a single video stream track and single audio stream track in most cases but there may be other options.
>
> A developer can then change the source of the video track via this API but still only has a single stream at any given time.

This appears to violate the security/privacy parameters we've discussed, 
which is that access to specific camera(s) is given, and if the app 
wants two cameras in one stream, it calls getUserMedia() twice before a 
stable state.  This proposed case is slightly different, this is one 
stream that's retargetable.  For this case, it would be a single 
getUserMedia() call, but the user would indicate more than one camera is 
allowed.  I would suggest for this case an app that wants to switch 
cameras tell the UI (via the constraints) that it wants access to all 
cameras, and then the UI needs to make that clear to the user when 
approving.

> The use case Is when a user supplies the page with one camera and the user then, purposefully, changes their mind and wants to switch to another web cam. It's proven difficult for pages to have to explain to users how to do that via the browser UI (particularly in the case where different browsers have different UIs that need to be explained to users to switch cameras). We'd therefore like to make camera switching a function of web apps themselves.

Agreed - video chat and camera apps on phones/tablets (and FaceTime I 
believe) have a front/back camera button often, and this API should 
allow for that without asking the user in realtime or making two 
requests at startup.

>> Note - the CLUE WG in the IETF (stuff for immersive telepresence) has decided that they want to be able to represent a camera's position and direction in detail in a room's coordinate system (X, Y, Z, horizontal and vertical angle, horizontal and vertical field-of-view). We often won't have that information, but if it is available, there should be a single way to expose it.
> Sounds interesting. A similar topic came up early on in the WHATWG efforts [1]. Essentially, we should take a look at the DeviceOrientation API and see if that helps us here, whether we could then send that data out-of-band to a remote peer or whether we could repurpose that API to hang off VideoStreamTrack either in a push/event based model or in a pull model. In reality though, camera orientation (as opposed to standard device orientation) is going to be hard to report correctly considering that this information is not something that a lot of devices or cameras currently provide via their APIs. I'd prefer to defer this from the initial discussion if possible pending support in underlying OSs.

Orientation is certainly something available to the app in most mobile 
OSes.  Orientation of a camera would be relative to device orientation 
in those cases.  In most if not all other cases (barring a fancy 
camera), orientation would be unavailable.  I could see returning the 
orientation of the camera relative to standard device orientation (or 
unknown).  The app could use that and the device orientation to know the 
video orientation, and then could signal that on a PeerConnection 
(WebRTC, not media-capture).

-- 
Randell Jesup
randell-ietf@jesup.org

Received on Wednesday, 15 August 2012 08:18:56 UTC