Re: Constraints structure and Capabilities API from Rich Tibbett on 2012-02-24 (public-webrtc@w3.org from February 2012)

From: Rich Tibbett <richt@opera.com>
Date: Fri, 24 Feb 2012 13:23:06 +0100
To: Dan Burnett <dburnett@voxeo.com>
CC: public-webrtc@w3.org, public-media-capture@w3.org
Message-ID: <4F47812A.9060306@opera.com>
Hi Dan,

Dan Burnett wrote:
> I'd like to get any general feedback now before fleshing out the precise wording to go into the spec documents and registry drafts.

CIL.

> *******************************************
>
> Overview
> ---------
> Instead of Hints we have Constraints.  Why?  What's the difference?
> The word "Hints" implies that they are suggestions by a web developer that may be completely ignored by the browser.  To address both the needs of implementers and the needs of web developers, there are four properties that the API should have:
> 1) it should allow the web developer to specify what he would like, IN TERMS THAT ARE MEANINGFUL TO THE BROWSER/IMPLEMENTER
> 2) it should allow the web developer to specify preferences when he can't get his first choice
> 3) it should allow the web developer to specify minimal criteria that he will accept
> 4) it should provide enough information to the browser that it can, on its own, handle many real-world occurrences such as congestion, user-driven screen resizing, etc.
>
> We believe the Constraints structure satisfies these properties.
>
> With regard to the specific list of constraints, the goal was to identify constraints that a) are meaningful to the browser, b) are likely to apply to all media of the given type (video or audio), and c) are likely to be of use to web developers.
>
> Note that we may eventually need to introduce a distinction between PeerConnection capabilities and constraints and getUserMedia capabilities and constraints.  The former is for the actual media that will be communicated between the browsers.  The APIs below were designed with the former in mind but may be similar to what we will want for getUserMedia.

Media that is going to be sent over a p2p connection and data that is 
simply intended for local playback (e.g. as the back for an AR app), 
local recording (e.g. for conference/dating/social network introductions 
and pre-recorded messages) or local manipulation (e.g. barcode scanning, 
face recognition) inherently have very different properties.

I think focus on the p2p use case has been at the detriment of 
consideration of local use cases. In all three of the local cases above 
that do not require peer-to-peer streaming it would be ideal simply to 
have the highest quality video and audio that can be provided by the UA 
returned for local usage.

When such a stream is then used to record or stream via additional 
methods in our spec then this is the point at which a number of 
characteristics need to be applied to the stream object...for which we 
have SDP negotiation and could have implicit recording quality standards.

So I'd argue that recording and p2p could apply their own sampling based 
on e.g. bandwidth negotiation *transparently* to the developer or user 
and separate from the local in-page playback of that video which should 
always represent the highest quality stream available from the 
Webcam/OS/UA in unison.

Starting from that position, it *may* then be possible to select one or 
two metrics on which to obtain a modified media stream. The use case 
being that with current hardware capabilities being what they are it may 
be preferential to obtain a particular sample rate for local playback 
video that is then intended for real-time analyse for e.g. computer 
vision use cases (barcode scanning, face recognition). This would, 
rather than being a central part of the design, be a provision for 
lower-end hardware and as those hardware capabilities improve such 
self-imposed quality metrics requested from developers can be abandoned 
in the future (meaning: the metrics will still exist but the hardware 
will simply have more capability to comfortably process the full highest 
quality streams in real-time).

Feedback on specific constraints below...

>
> Constraints API/structure
> --------
> The Constraints API is not really an API, but rather a data structure.  The structure allows a web developer to specify an ordered list of key-value pairs, where each key is a constraint the browser is to attempt to satisfy.  For any two listed constraints, if the browser is able to satisfy one or the other, but not both, the browser MUST satisfy the one that comes earlier in the list. Additionally, each constraint may be marked as mandatory.  If the browser is unable to satisfy all mandatory constraints, the media request call MUST fail with an appropriate error message and list the mandatory constraints it was unable to satisfy.
>
> An initial list of valid constraints is:
>
> video-min-width:  minimum width in pixels
> video-max-width:  maximum width in pixels
> video-min-height:  minimum height in pixels
> video-max-height:  maximum height in pixels

Given the highest quality stream, I can use video.width and video.height 
to resize on my current page. The same applies if I receive a remote stream.

> video-min-aspectratio:  minimum width-to-height ratio
> video-max-aspectratio:  maximum width-to-height ratio
> video-min-framerate:  minimum number of frames per second
> video-max-framerate:  maximum number of frames per second
> video-min-pixelrate:  minimum pixel transmission rate, in megapixels per second
> video-max-pixelrate:  maximum pixel transmission rate, in megapixels per second
> video-min-timebetweenkeyframes;  minimum time, in milliseconds, between key/reference frames
> video-max-timebetweenkeyframes:  maximum time, in milliseconds, between key/reference frames
> video-min-bandwidth:  minimum bandwidth, in megabits per second
> video-max-bandwidth:  maximum bandwidth, in megabits per second
> video-lowmotion:  whether the coding is to be optimized for scenes where capturing fine detail is more important than motion.  Allowed values are "lowmotion" and "generic".
> video-autowhitebalance:  whether Automatic White Balancing is to be turned on.  Allowed values are "on" and "off".
> audio-min-bandwidth:  minimum bandwidth, in kilobits per second
> audio-max-bandwidth:  maximum bandwidth, in kilobits per second
> audio-min-mos:  minimum Mean Opinion Score, ranging from 1 to 5
> audio-max-mos:  maximum Mean Opinion Score, ranging from 1 to 5
> audio-min-codinglatency:  minimum coding latency, in milliseconds
> audio-max-codinglatency:  maximum coding latency, in milliseconds
> audio-min-samplingrate:  minimum sampling rate, in samples per second
> audio-max-samplingrate:  maximum sampling rate, in samples per second

All of the above are negated in the case that I can simply obtain the 
highest quality stream that the Webcam/OS/UA supports locally. When that 
stream is negotiated with a peer or sent for recording I could apply 
other metrics but mostly the system should adapt these metrics as needed.

> audio-voiceorgeneric:  whether the coding is to be optimized for voice or for general audio such as music.  Allowed values are "voice" and "generic".

This is a useful constraint as, right now, it affects the codec used to 
render local playback.

> audio-gaincontrol:  whether automatic gain control is to be turned on.  Allowed values are "on" and "off".
> audio-echocancellation:  whether echo cancellation is to be turned on.  Allowed values are "on" and "off".

These could be implicit in the audio-voiceorgeneric constraint.

>
> These constraints will be stored in a Constraints IANA registry which can be extended via the Expert Review policy.
>
>
> Example:
>   {0:{video-min-height:600, mandatory:true},
>    1:{video-max-aspectratio:1.333333333333},
>    2:{video-min-timebetweenrefframes:20},
>    3:{video-max-bandwidth:500, mandatory:true},
>    4:{video-min-framerate:30},
>    5:(video-autowhitebalance:on}}
>
> Capabilties API
> ---------------
> The capabilities API, broadly speaking, is the constraints API, per device/media stream/channel.
>
> A call to getCapabilities() returns a JavaScript Array containing, for each device/media stream/channel, all relevant constraints (the constraints as specified by the constraints API). For example,
>
> {camera001:{
>     video-min-width:  800,
>     video-max-width:  1024,
>     video-min-height:  600,
>     video-max-height:  768,
>     video-min-aspectratio:  1.333333333333,
>     video-max-aspectratio:  1.333333333333,
>     video-min-framerate:  24,
>     video-max-framerate:  60,
>     video-min-pixelrate:  15,
>     video-max-pixelrate:  47,
>     video-min-timebetweenkeyframes;  20,
>     video-max-timebetweenkeyframes:  40,
>     video-min-bandwidth:  1.5,
>     video-max-bandwidth:  3.5},
>   camera002:{
>     video-min-width:  1600,
>     video-max-width:  1920,
>     video-min-height:  1080,
>     video-max-height:  1200,
>     video-min-aspectratio:  1.33333333333,
>     video-max-aspectratio:  1.77777777777,
>     video-min-framerate:  24,
>     video-max-framerate:  120,
>     video-min-pixelrate:  57.6,
>     video-max-pixelrate:  248,
>     video-min-timebetweenkeyframes;  20,
>     video-max-timebetweenkeyframes:  40,
>     video-min-bandwidth:  8,
>     video-max-bandwidth:  29.4},
>   audio001:{
>     audio-min-bandwidth:  1.4,
>     audio-max-bandwidth:  128,
>     audio-min-mos:  2,
>     audio-max-mos:  5,
>     audio-min-codinglatency:  10,
>     audio-max-codinglatency:  50,
>     audio-min-samplingrate:  8000,
>     audio-max-samplingrate:  48000}}
>
> Note that bandwidth and latency capabilities reflect the combined effect of that device and any codecs we have available to use with it.
>

This list seems way too verbose for the use cases we have. That's not to 
mention that this is a great fingerprinting API.

While designing for the untrusted environment is harder it must be the 
starting point for this proposal. Also, local media playback should also 
be the starting point for obtaining a stream. How that stream behaves 
when it's streamed or recorded is a downstream API requirement.

- Rich
Received on Friday, 24 February 2012 12:23:40 UTC