Re: V4 comments (Re: Settings API Proposal v3 feedback summary) from Stefan Hakansson LK on 2012-10-09 (public-media-capture@w3.org from October 2012)

From: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>
Date: Tue, 9 Oct 2012 14:40:38 +0200
To: public-media-capture@w3.org
Message-ID: <50741B46.1090003@ericsson.com>
On 10/06/2012 01:09 PM, Harald Alvestrand wrote:
> On 10/05/2012 08:04 PM, Travis Leithead wrote:
>>> From: Harald Alvestrand [mailto:harald@alvestrand.no]
>>>
>>> On 10/03/2012 03:10 AM, Travis Leithead wrote:
>>>> V4 is now ready. I've posted it on Mercurial for easier reading:
>>>>
>>>> http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-
>>> capture/proposals/SettingsAPI_proposal_v4.html
>>>> Thanks! Looking forward to your continued feedback.
>>>>
>>> Thanks Travis!
>>>
>>> I like the way this is moving - in particular, I like the idea of
>>> reducing the number of places where apps have to care what kind of
>>> stream or track they're handling.
>>>
>>> Some worries persist, however:
>>>
>>> - I really like the idea in section 3 where you model the request
>>> interface as "construct a constraints object and apply it". This unifies
>>> the constraints namespace and the namespace for requested attributes
>>> quite neatly.
>>> HOWEVER - the namespaces are a bit non-overlapping, in particular the
>>> control of height and width, where you introduce a new enumerated
>>> "dimension" namespace. When items in both namespaces collide, do we
>>> synthesize an union, or do we declare a winner?
>> My concept of constraints is non-persistent, which is where I think this
>> confusion is coming from. You seem to have the view that constraints
>> persist,
>> and that whatever changes the applications wants to make after an initial
>> set of constraints have been established must be layered on top of those
>> initial constraints. Hence there's the notion of collisions, unions,
>> etc.,
>> and the need to deal with the complexity that introduces.
>>
>> In my view (the way I've structured the proposal), constraints are
>> non-persistent, and behave conceptually similar to a "picker" or "search"
>> API. The result of applying constraints produces a single device that
>> has been
>> suitably configured to match the requested constraints. (Note that
>> getUserMedia
>> actually allows for at most two [different] devices to be selected,
>> but this
>> can be thought of as two separate applications of constraints to each
>> class
>> of device.) There is no "memory" associated with the previous constraints
>> after they have been applied. Having selected and configured a device,
>> the
>> previous constraints are forgotten.
>>
>> Thus when you start a set of "request()" changes to change the
>> settings of
>> a particular device, this is building up a brand-new set of
>> constraints to
>> apply to the device (which could contradict a previous set of constraints
>> that were applied). If it didn't work this way, then it would be very
>> challenging to configure a setting to the minimum, then later to the
>> maximum
>> and then back to the minimum again, as a particular scenario (or
>> end-user) might
>> want to do.
> I see where we differ now .... since I'm assuming from the get-go that
> the UA will need to dynamically adapt to the conditions of CPUs,
> resources and network bandwidth available, I've just assumed that the
> min/max constraints applied at creation time establish a space within
> which the UA is not only able to pick a current value, it is able to
> change to a differing operation point within that range without
> requiring ay interaction with the Javascript application.
>
> This kind of adaptation happens today, and not only in network related
> matters; for instance, the Mac iSight cameras are famous for dropping
> their framerate in low light conditions - trading picture quality for
> framerate.
>
> That's also the background for my continued preference for "the range of
> acceptable operation points" rather than "this specific mode" as what
> gets passed from the user's side.

I guess the preferred model depends on what should happen if the wanted 
setting can not be met (temporarily). If this would mean that the device 
was stopped completely (and putting that DeviceTrack into state 
"ENDED"), then I think we would need ranges (and if I developed an app I 
would put a very wide range because even very low frame rates is better 
than a stopped track in most cases - and I think this would apply to 
resolution and other settings as well).

But if it would only mean that (temporarily) e.g. a lower frame rate was 
delivered, but the desired frame rate was resumed once the resources 
were again available, then I think using "request()" to set a specific 
value rather than a range would be OK.

Note that the application can check what is being delivered (as long as 
the device and driver combination delivers the truth!) via the 
VideoStreamTrack interface - but we should add framerate to it. We could 
even redefine the "constrainterror" event so that it can fire any time 
any setting can not be met (and tell the app what setting that can for 
the time being not be met) rather than the current model where it can 
only fire as a result of a "request()" operation.

In practice I expect the settings to be more like wishes in many cases 
any way - depending on driver and device they may or may not have the 
intended result as pointed out (we've all heard about devices lying 
about the framerate).


>
> Good to have this explicitly framed!
>
>>> - It's not clear to me what happens to existing constraints placed on a
>>> track after a series of "request" calls and subsequent applications; are
>>> the constraint lists merged into an intersection? are they replaced? or
>>> is this device and implementation dependent?
>>> I'm particularly thinking of usage patterns where an implementation uses
>>> constraints at the getUserMedia phase to get a suitable device, and then
>>> uses the request interface to manipulate it afterwards; is he allowed to
>>> set constraints outside the originally specified ones, or isn't he?
>>> Not clear what the right answer is, but there should be only one.
>> I hope my previous explanation made the answer to the above questions
>> clear.
>>
>>
>>> - I've worried about this before and gotten some pushback, but I'm still
>>> worried about the wisdom of setting specific values rather than
>>> "acceptable ranges" - for instance, some systems will drop resolution,
>>> framerate or encoding complexity based on CPU temperature, to avoid
>>> systems shutting down mid-call under "hot" conditions. Setting specific
>>> values will seem to take away that freedom; setting a range of "values I
>>> can live with" would preserve it.
>> Interesting. My proposal doesn't account for the user agent dynamically
>> changing the min/max ranges of the settings. In practice, I don't know
>> how
>> important this case really is? For example, if the device becomes too
>> hot,
>> then as an implementer, I'd just turn off the device to let it cool down!
> And cut the videoconference in mid-sentence, rather than continuing at a
> lower quality?
> We would not get a good user feedback from that.
>>
>> I'm pretty confident that setting specific values is desirable, given the
>> settings tell the developer exactly what the acceptable value ranges are.
> Nit: In many cases, there is no source that can be queried for the list
> of acceptable settings; certain camera/driver combinations will happily
> accept any setting you request, and just scale the stuff coming off the
> hardware to fit, without informing you that this is going on. It's a
> strange world down there....
>> I also envision a use case where these settings are likely to be
>> hooked up
>> directly to the application's UI, in which case a select control or input
>> range control will represent the various settings, and the application
>> just
>> passes those values directly into the request() API to apply the changes.
>>
>>
>>> - I wonder if experimentation and flexibility could be served if we
>>> could generalize the interface, so that we don't have to rev the API
>>> every time we add a setting - the long list of settings in
>>> PictureAndVideoSettings could possibly be expressed as
>>>
>>> interface MediaSetting {
>>>       DOMString name;
>>>       (MediaSettingsList or MediaSettingsRange) setting;
>>> }
>>>
>>> interface PictureAndVideoSettings {
>>>      MediaSetting[] settings;
>>> }
>>>
>>> I don't know if it is wise, but I'd like to hear others' thoughts on
>>> such an API:
>> I definitely thought about this. However, there are many more
>> drawbacks to
>> this approach (in my opinion) than otherwise:
>>
>> 1. The settings list makes feature detection much more difficult.
> How?
>> 2. The settings list moves the actual settings two levels deeper off
>> of the
>>     object. (Harder to use.)
>> 3. The settings list does not solve name collisions (it's still a
>> problem that
>>     would need to be dealt with).
> In constraints, a registry is proposed as the solution to collisions.
> Wouldn't that work here too?
>> 4. The settings list does not make it easier/more flexible to rev the API
>>     (partial interfaces in WebIDL allow simple extensions for v2 specs)
> I'm not so worried about future revs of the API, I'm worried about
> future revs of the implementations.
> New APIs take 2 years to complete; new implementations take six weeks.
>>
>>
>>> - lastly, I think that in some cases, for some constraints, it makes
>>> sense to offer a media settings interface on *any* media stream track,
>>> including remote ones. I wonder if we can generalize it to be an
>>> interface on MediaStream itself without hurting it?
>> I agree that this hypothetical constraint could exist. If we wanted to
>> add it
>> we could simply define it in terms of the proposed interface like so:
>>
>> partial interface MediaStreamTrack {
>>     readonly attribute MediaSettingsRange? hypotheticalSetting;
>> };
>>
>> However, I think we've gotten to the point where we need to stop
>> talking about
>> hypotheticals, and talk about actual settings and actual
>> constraints--something that
>> implementers can start building with confidence. To make progress
>> along those lines
>> my proposal puts a stake in the ground about what settings and
>> constraints could be
>> relevant for device capture in v1. We should have a conversation about
>> what track types
>> might be necessary for RTCPeerConnection, and what
>> settings/constraints could be
>> exposed on those objects (though it's somewhat orthogonal to
>> completing the Media Capture
>> and Streams spec IMO).
> OK, I'll make the specific requirements clear from the requirements we
> (Google Chrome Media) are being fed:
>
> Horizontal size, vertical size, aspect ratio, frame rate and (max /
> target) bitrate.

This seems as a reasonable list of initial things that are needed.

>
> The last one is a constraint that can only make sense in the context of
> a PeerConnection.
> There are several scenarios where the 4 first ones make sense to request
> from either an incoming or an outgoing MediaStream, where the decision
> on how it is implemented (whether by changing the camera configuration
> or by inserting a scaling function in the pipeline) should be taken by
> the UA depending on the configuration-of-the-moment.
>
> For instance, a camera may feed a local preview, a LAN-carried
> HD-quality feed into a TV studio, and a 3G-carried feed into a mobile
> phone-based monitor. It may make sense for the UA to choose to use a
> rescaler in this case for the mobile-phone based monitor.
>
> The application has to be able to express those desires. The UA needs to
> figure out the best ways to satisfy them.
>
>
>>
>
>
Received on Tuesday, 9 October 2012 12:41:08 UTC