Re: Capabilities API proposal from Cullen Jennings on 2012-01-30 (public-webrtc@w3.org from January 2012)

From: Cullen Jennings <fluffy@cisco.com>
Date: Mon, 30 Jan 2012 13:24:57 -0800
To: Dan Burnett <dburnett@voxeo.com>
Cc: public-webrtc@w3.org
Message-Id: <5D1357BE-6487-460B-BD53-3697B7ADF036@cisco.com>
I think what I am pushing at is there is approximately an infinite set of configurations. So intend of treating the configuration space as a desecrate set problem, treat it as a multi dimensional space and express constraints on the feasible space. The getCapabilities can return the rough shape of the valid configurations space. Trying to return every point in the space does not seem feasible and even in limited cases where it is, does not seem better or easier to work with than the alternative way of looking at it. The hints can put constraints on the part of the configuration space that will be used. 

At a certain level, I think the most important thing is agreeing on roughly what the various axis of the configuration space are and on that I think we are on the same page. 


On Jan 24, 2012, at 6:06 , Dan Burnett wrote:

> Cullen,
> 
> I'm glad you like my proposal overall.  Your email below makes many specific comments about the example that I used, which I found strange since I made no attempt to actually define legal capability values.
> 
> The most important takeaways from my proposal were 1) that authors needed to be able to provide a set of configurations they would be willing to accept, and 2) that the capabilities needed to match the hints.  The latter obviates the need for a separate Capabilities registry.
> 
> While I understand your concern with an exponential exploding of configurations, I would rather discuss how to make this approach scale than drop the *extremely important* takeaways above.  Ultimately, if authors cannot explicitly control what they will get they will at least demand being able to provide a set of configurations they would find acceptable.  Look at HTML today -- application authors are not happy and have never been happy with purely logical specifications of page content.  They care very much about how the logical descriptions are rendered as specific layout.
> The same will be true for what we provide.
> 
> -- dan
> 
> On Jan 20, 2012, at 2:16 PM, Cullen Jennings wrote:
> 
>> 
>> Dan … first at the high order, I like this. I think this is about the right level of detail. The one part I don't like is having an exponential combination exploding of options. I don't think that will be scalable as more capabilities are added over time. I'd rather see an absolute set of capabilities that did not try and represents the fact that no all combinations are possible. Introspection of streams will still allow for the application to find out what it got.  I do have a few comments on the details but overall, I like this. 
>> 
>> First, on the topic of video resolution - I'm alway scared of a fixed set of labels such as cga, vga, xvga and so on - the problem is that apps get coded with a set of labels that corresponds to what they had at the time there war coded and then can not take advantage of new higher resolutions as they come out. My proposal would be that instead we reports max-width, max-height, and max-fps. The camera reports the max it supports in any mode even if they are not available together. For example, a camera that can do WVGA at 120 ftp but 1080p at 30 fps would reports it max-height at 1080 and it max-fps at 120 even though it may not be able to do both at the same time. Similarly, if several camera were attached to the browser at the same time, it would report a single max that represented the max across all the camera. This may sound very limiting but is substantially reduces the complexity of the API, reduces fingerprinting privacy concerns, and still meets the use cases I heard about of trying to render reasonable user interfaces given the possible capabilities of the machine. 
>> 
>> For audio: I like the music and speech. I'm far more dubious about fm, studio, etc … I will note that I can not imagine any browser that does not have the capability for both music and speech if it does audio at all so I sort of wonder about the value of this as a capability. I do see it needed as a hint. 
>> 
>> For video: I like action for temporal optimization and I think the we could call the spacial resolution "detail" or something like that. Again, not clear that the capability is here - this is more of hint. 
>> 
>> I don't really get what face2face would be but I think it is worth be able to indicate that something is interactive media vs streaming media. I'd expect the browser to enable echo cancelation and such for interactive. 
>> 
>> The whole bandwidth thing I am confused on. Assume we even had any idea what "broadband" bandwidth was, I sort of doubt that browser would be able to reliably figure out if it had it or not before it was sending media. 
>> 
>> The handsfree attribute is interesting - I don't think it can be implemented for many devices today but that does mean we should not have in in the API so that we can move towards having it. However, not sure handsfree is the right term. I mean a headset is hands free but I doubt that is what you mean. 
>> 
>> Reading this email, it looks like I have a bunch of issues with this, but really, at the high level I like this and think it is about the right level of information. 
>> 
>> Cullen
>> 
>> On Jan 19, 2012, at 8:04 , Dan Burnett wrote:
>> 
>>> Here is a proposal for a Capabilities API.  Note that the syntax of what is returned will ultimately be determined by the Hints API.  I have included a brief example return value and answers to questions I expect people will have.
>>> 
>>> -- dan
>>> 
>>> 
>>> 
>>> interface Capabilities {
>>> HintList getCapabilities()
>>> };
>>> 
>>> The getCapabilities method returns a JSON array of profiles representing local capabilities, where each profile is a collection of internally consistent options.  The intent is that the author can select the profiles that are acceptable, rank them in order from most to least preferred, and send the ordered list of acceptable profiles as input to the Hints API.  The Hints API will define the options that are allowed within profiles, as well as defining the HintList type.
>>> 
>>> Examples:
>>> (Full examples are TBD when we have a Hints API.  These are just to give an idea of what the structure would look like)
>>> 
>>> This is an example of a HintList, a list of profiles which the app author will subset and priority-order before sending to the Hints API.
>>> 
>>> [{Video:
>>>  {Purpose: whiteboard,
>>>   Resolution: high},
>>> Audio:
>>>  {Type: speech,
>>>   Resolution: mobile,
>>>   Handsfree: true}},
>>> {Video:
>>>  {Purpose: action,
>>>   Bandwidth: lan-internet},
>>> Audio:
>>>  {Type: music,
>>>   Bandwidth: broadband-high,
>>>   Handsfree: false}},
>>> {Video:
>>>  {Purpose: face2face,
>>>   Resolution: 720p},
>>> Audio:
>>>  {Type: speech,
>>>   Bandwidth: broadband-medium,
>>>   Handsfree: true}},
>>> {Video:
>>>  {Resolution: 1080p,
>>>   Bandwidth: broadband-low},
>>> Audio:
>>>  {Resolution: landline,
>>>   Bandwidth: streaming-high,
>>>   Handsfree: false}},
>>> {Video:
>>>  {Purpose: action,
>>>   Resolution: 480p},
>>> Audio:
>>>  {Resolution: fm,
>>>   Bandwidth: streaming-low}},
>>> {Video
>>>  {Purpose: face2face,
>>>   Resolution: medium},
>>> Audio:
>>>  {Type: music,
>>>   Resolution: hi-fi}},
>>> {Video
>>>  {Purpose: whiteboard,
>>>   Resolution: low},
>>> Audio:
>>>  {Type: speech,
>>>   Resolution: studio}},
>>> {Video
>>>  {Purpose: face2face,
>>>   Resolution: high}},
>>> {Audio:
>>>  {Type: music,
>>>   Bandwidth: broadband-high}}]
>>> 
>>> 
>>> FAQ
>>> Why do we need a Capabilities API?
>>> 1. Web developers want to be able to grey out/hide capabilities that are not available, such as a webcam or microphone.
>>> 2. Web developers may not know which configurations are allowed or feasible.  For example, for a video device it may be possible to specify high quality or low bandwidth, but not both.  It is quite likely that an app author who wants both would rather specify a preference for one than trust the hint system to make the right selection between the two.
>>> 
>>> Why use a list of profiles to select from, order, and send to the Hints API?
>>> 1. This ensures that the profiles provided as hints are feasible and contain compatible and legal options.
>>> 2. It allows the author to see precisely the levels of detail available as hints.  For example, one video device might allow resolutions to be specified as 'high', 'medium', 'low', '640x480', and '1024x768'.  Even though these options are not all mutually exclusive, they are all valuable since some authors may wish to use the descriptive labels and others the precise resolutions.
>>> 3. This may allow for easy extensibility.  For example, if a video device supports 'avatar mode' and a browser supports that capability, it will appear in the capability profiles and be selectable via hints on that browser.
>>> 
>>> What about remote capabilities?
>>> 1. Remote capabilities can be obtained through separate negotiation between the parties, since each end can obtain its own local capabilities and then communicate, e.g., via the data channel.  There is no need to standardize a specific protocol for obtaining remote capabilities.
>>> 2. By leaving the communication of capabilities up to the Javascript application, the application can control how much information is made available to the remote end, thus implementing privacy restrictions appropriate to the application.  The browser can always, of course, restrict the information it provides to ensure that even the local side does not have inappropriate capability information.
>> 
> 
>
Received on Monday, 30 January 2012 23:41:46 UTC