RE: Synchronous getUserMedia proposal from Travis Leithead on 2012-11-14 (public-webrtc@w3.org from November 2012)

From: Travis Leithead <travis.leithead@microsoft.com>
Date: Wed, 14 Nov 2012 22:20:11 +0000
To: Adam Bergkvist <adam.bergkvist@ericsson.com>, Martin Thomson <martin.thomson@gmail.com>, Stefan Håkansson LK (stefan.lk.hakansson@ericsson.com) <stefan.lk.hakansson@ericsson.com>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <9768D477C67135458BF978A45BCF9B3853AA49A4@TK5EX14MBXW604.wingroup.windeploy.ntde>

> From: Adam Bergkvist [mailto:adam.bergkvist@ericsson.com]
> On 2012-11-02 19:32, Martin Thomson wrote:
> > In its simplest form:
> >
> > MediaStream getUserMedia(MediaConstraints constraints);
> >
> > This returns a stream that provides no content (open option: a tainted
> > stream that can only be displayed locally).
> >
> > Consent is indicated with a new onconsent event on the stream; failure
> > reuses the onended event.  A new reason parameter is added to the
> > onended event indicating the reason (this includes all existing
> > onended reason codes, if any, plus all getUserMedia error codes).
> >
> > The major complaint with this is that it leads to an
> > inaccurate/misleading expectation about the usability of the stream.
> > That expectation can lead to the assumption that consent is granted,
> > which would be a bad assumption.
> 
> This approach is not flawless, but to me it seems like the most reasonable
> one at the moment.
> 
> We already have the concept of a stream that is dispatched to JavaScript but
> the source is not ready to provide data yet. This currently happens when you
> receive a stream over a PeerConnection and all the tracks are muted until
> data arrives over the network. I think gUM() with a return value could be
> treated similarly, and local data is suspended until the user grants permission.
> 
> In the network case, a media description is used to create the stream and the
> receiving side and it's pretty capable of describing future stream content. In
> our local case, the user may only grant one media component. Perhaps
> ended track state is good enough to solve this.
> 
> I think we'll freak people out if a tainted stream is delivered at once.
> Even though page authors can't access the content or transport the stream,
> they can mix the camera view into the page content and that may make
> people uncomfortable (depending on the page they're visiting).

I am surprised that I haven't heard much more pushback on this design approach. I suppose that means it's an inevitable transition.

A few questions:
1. If the user agent doesn't have any cameras what happens? (Perhaps a null value is returned? A fake Media Stream in the ENDED state?) Generally speaking, what do we do with all the old error conditions?
2. How are multiple cameras handled supported? By multiple calls to the API as before? It seems like this aspect of the old design needs to change.

An alternative idea is to use getUserMedia as an approval/activation method for track "promises". As such you'd need a way to create appropriate track "placeholders" and getUserMedia would "upgrade" these to actual media-containing tracks. Consider:
var videoPlaceholder = new MediaStreamTrack("video");
var audioPlaceholder = new MediaStreamTrack("audio");
var placeholderMS = new MediaStream([videoPlaceholder, audioPlaceholder]);

The above objects are in a "not started" state and are not tied to any source [yet]. Then getUserMedia will try to bind all the placeholder tracks to real media sources and may succeed, fail, or partially succeed, given the number of requested placeholder tracks. Reporting for each binding failure/success will be in the form of events on each respective track placeholder.

void getUserMedia(MediaConstraints constraints, MediaStream unlockMyTracks);

The advantages: getUserMedia will know at invocation-time how many media sources to try and activate and can show the user a suitably relevant UI only once--the user stays in control of activating media devices, and can do so at their leisure--it doesn't impact the use of the MediaStream and/or track objects.

Disadvantages: in-place upgrading of tracks from generic MediaStreamTrack types to derived VideoDeviceTrack, etc., might be a little weird since this "upgrade" would suddenly expose new APIs that didn't exist before on the MediaStreamTrack. This is another argument against the current Track inheritance model vs. a composition-based model.

Received on Wednesday, 14 November 2012 22:21:36 UTC