Track ducktyping (was: RE: Settings retrieval/application API Proposal (formerly: constraint modification API v3) from Travis Leithead on 2012-08-28 (public-media-capture@w3.org from August 2012)

From: Travis Leithead <travis.leithead@microsoft.com>
Date: Tue, 28 Aug 2012 00:40:38 +0000
To: Rich Tibbett <richt@opera.com>
CC: "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <9768D477C67135458BF978A45BCF9B38384355DA@TK5EX14MBXW602.wingroup.windeploy.ntde>
New title for this conversation :)

> From: Rich Tibbett [mailto:richt@opera.com]
> Hi Travis,
> 
> I want to comment briefly on some things I thought we could optimise
> regarding the overall API structure here as it seems we have both been
> working on APIs with almost identical scope but slightly different wiring.
> 
> How would you feel about the following:
> 
> 1. Rather than isolating devices from tracks could we allow .videoTracks and
> .audioTracks to return any of the following objects (depending on what any
> particular media stream track object is actually representing):
> 
> - a MediaStreamTrack (a generic interface for general purpose stream track
> behaviour). Identical to the interface currently defined in the getUserMedia
> specification (minus the 'kind' attribute). Not ever directly exposed via
> MediaStream.videoTracks or MediaStream.audioTracks.
> - a VideoStreamTrack (an interface that implements the existing
> MediaStreamTrack interface + takePicture/onpicture). Used to represent
> non-camera video stream tracks (or read-only camera stream tracks such as
> those obtained from a remote peer) in MediaStream.videoTracks.
> - a VideoDeviceTrack (an interface that implements both VideoStreamTrack
> interface above and your proposed VideoInfo dictionary from below). Used
> to represent (local) camera video stream tracks in MediaStream.videoTracks.
> - an AudioStreamTrack (an interface that implements the existing
> MediaStreamTrack interface). Used to represent non-microphone audio
> streams (or read-only microphone stream tracks such as those obtained from
> a remote peer) in MediaStream.audioTracks.
> - an AudioDeviceTrack (an interface that implements both the
> AudioStreamTrack interface above and your proposed AudioInfo dictionary
> from below). Used to represent (local) microphone streams in
> MediaStream.audioTracks.
> - Other tracks as required (e.g. DataTracks)

The proposal went out of its way to strongly associate LocalMediaStream objects with devices. This seems like a win to me because local device configuration is always on the local media stream. This makes for a stable, dependable API surface from all local media stream instances (no guesswork). This is the opposite of the duck-typing approach above.

The proposal also does away with track lists on local media streams -- tracks lists seems like the wrong design considering the WG's decision to only ever have one audio/video track per gUM request. It just so happens that LocalMediaStreams inherited this wrongness from the generic MediaStream interface. Additionally, what's the point of a LocalMediaStream object at all, if it can be just as generic as a MediaStream once tracks are added/removed from its track list?

The proposal ensures that any MediaStreams received remotely, or those created dynamically do not have the API surface to allow for configuration changes (which implies some type of magical back-channel). The principle at play is that only the entity that first obtained permission for the media device(s) is authorized to make changes to their configuration. This association of ownership is lost if you start mixing and matching track types in arbitrary MediaStream objects. This is also the reason why I don't like folding the configuration APIs into a "VideoDeviceTrack", and why I have a separate "track" property to get from the device object to the track.

Given these points, I'm pretty hesitant to embrace the duck-typing approach you suggest, though I can easily imagine how to adapt this proposal to it :)

I hope this helped explain some of the rationale behind the design choices I made.


> 2. Allow developers to utilise duck typing (or object type checking if they
> wish) to check for, and set, capabilities against an object provided in either
> MediaStream.videoTracks or MediaStream.audioTracks. If an object supports
> certain properties then it can be assumed to be able to apply those settings
> to the object it is representing (i.e. if it looks like a duck and it acts like a duck
> then treat it like a duck).


Did you notice the asynchronous nature of applying the settings? I don't really want a design that forces implementations to have to make synchronous blocking calls on the UI thread to devices that may take some arbitrary-long time to make a configuration change. Typically, the property-setter design approach assumes synchronicity; in other words, if you set width = 1920 in one line, and read it in the next line, you expect to see the changes reflected. I like have property setters, as it's easy to grab the info--in fact, in an early draft I filled the VideoDevice with a bunch of read-only properties for this purpose. However, it just started to look like API pollution after a while, and given that I was able to easily factor out the "picture settings" which may produce different supported resolutions, aspect ratios, etc., as a dictionary, then it started to become clear that returning a JS object with all these settings on it was the right design.

I don't know if I stated it explicitly, but the expectation of an implementation is to always populate all of the properties in the dictionary with a value (even if it doesn't support them explicitly) with calling getSettings(). That way you can treat the dictionary like an object with a bunch of properties--the only difference is that you control when the settings are applied.


> What we end up with is really similar to what you propose but we can use
> fewer interfaces than you introduced below :)

It seems like there's a lot of interfaces, but more than half of that bulk is dictionaries. There is a little factoring around MediaStream that I'm not too proud of, but it was the most logical way (to me) of making LocalMediaStreams special, without having to say: (MediaStream or LocalMediaStream) in all the relevant places.


> Here is a rewrite of one of your examples (applying resolution constraints)
> based only on the two principles described above.
> 
> function gotMedia(localStream) {
> 
>   for(var i = 0; i < localStream.videoTracks.length; i++) {
>     var s = localStream.videoTracks[i];
> 
>     // If s looks like a VideoDeviceTrack and it acts like a
>     // VideoDeviceTrack then it's a VideoDeviceTrack :)
>     if(s.maxWidth >= 1920 && s.maxHeight >= 1080) {
>       // See if I need to change the current settings...
>       if (s.width != 1920 && s.height != 1080) {
>         s.width = 1920;
>         s.height = 1080;
>         if (s.width != 1920 && s.height != 1080)
>           console.error("Device doesn't support at least 1080p");
>       }
>     }
>   }
> }

In the above example, you're not sure that the outer 'if' condition is failing due to the properties being undefined or because the maxWidth/Height doesn't meet the requirements. You're also introducing complexity with the 'for' loop in the first place, even though there's likely only a single videoTrack in that list.

Also, when working with a bunch of potentially diverse camera and microphone devices, an async approach to setting the properties is probably better. Consider that setting the width in one statement might fail because that particular width setting, paired with the current height setting may not work compatibly. Since you have to set these properties (width/height) serially, then these problems are introduced when things are synchronous. Being able to create the settings snapshot that you want and then apply them all in one shot eliminates these type of problems.


 
> Here's the example for taking a picture also re-written based on the above:
> 
> function gotMedia(localStream) {
>   // If a video track looks like it can take a picture, then it
>   // can take a picture :)
>   if(localStream.videoTracks[0].takePicture) {
>     localStream.videoTracks[0].onpicture = showPicture;
>     // attempt to set flash
>     // or let it fail silently if flash is not supported or
>     // flash cannot be changed
>     localStream.videoTracks[0].flashMode = 'on';
>     localStream.videoTracks[0].takePicture();
>   }
> }

When I look at this, I see a bunch of hard-coded '[0]' statements, which adds to the validity of my argument supporting dropping the track list concept on local media streams.


> 
> function showPicture(e) {
>   var ctx = document.querySelector("canvas").getContext("2d");
>   // e.data is the ImageData property of the PictureEvent interface.
>   ctx.canvas.width = e.data.width;
>   ctx.canvas.height = e.data.height;
>   ctx.putImageData(e.data);
>   // TODO: can get this picture as an encoded Blob via:
>   // ctx.canvas.toBlob(callbackFunction, "image/jpeg"); }
> 
> I think I will leave the rewriting of e.g. the zoom example included below as
> an exercise for the reader at this point (Spoiler alert: all the examples below
> are supported equally well in either approach).
> 
> What I guess the upshot here is is that I am suggesting that we don't hide
> these object capabilities behind changeSettings and getSettings API
> methods. Although we discussed recently that a smaller surface area could
> be an I prove meant the main reasons why we probably don't want to have
> those kinds of constructs is perhaps best described here:
> http://robert.ocallahan.org/2012/05/canvas-getcontext-mistake.html

Yeah, I kind-of see how that applies, yet at the same time, it seems to contradict your duck-typing approach as well. I view the track list objects as a kind of "getContext()" call--you never really know what you're going to get!  With getSettings() you always get the same dictionary format, with the same properties on it (In WebIDL, Section 4.2.20, converting IDL dictionary to ECMAScript object-- in step 3.1.2 assume that the dictionary member named key *is* always present on V).


> Hope this helps and thanks again for putting such a detailed proposal like this
> together.

Thanks!
Received on Tuesday, 28 August 2012 00:41:17 UTC