Re: Capabilities API proposal from Cullen Jennings on 2012-01-23 (public-webrtc@w3.org from January 2012)

From: Cullen Jennings <fluffy@iii.ca>
Date: Sun, 22 Jan 2012 19:42:33 -0800
To: Randell Jesup <randell-ietf@jesup.org>
Cc: public-webrtc@w3.org
Message-Id: <9E818FE4-79FB-4964-B8E6-4435A2C142BC@iii.ca>
Bit inline …

On Jan 21, 2012, at 4:38 , Randell Jesup wrote:

> On 1/20/2012 2:16 PM, Cullen Jennings wrote:
>> First, on the topic of video resolution - I'm alway scared of a fixed set of labels such as cga, vga, xvga and so on - the problem is that apps get coded with a set of labels that corresponds to what they had at the time there war coded and then can not take advantage of new higher resolutions as they come out. My proposal would be that instead we reports max-width, max-height, and max-fps. The camera reports the max it supports in any mode even if they are not available together. For example, a camera that can do WVGA at 120 ftp but 1080p at 30 fps would reports it max-height at 1080 and it max-fps at 120 even though it may not be able to do both at the same time. Similarly, if several camera were attached to the browser at the same time, it would report a single max that represented the max across all the camera. This may sound very limiting but is substantially reduces the complexity of the API, reduces fingerprinting privacy concerns, and still meets the use cases I heard about of trying to render reasonable user interfaces given the possible capabilities of the machine.
> 
> I respectfully disagree, somewhat.  Reporting is one issue, but for selecting I want to be able to give priority to either resolution or frame rate, or if we think more control is needed, a target minimum frame rate.

I think I agree with you but we may be talking past each other. I think of the hints structure as the think you use when an app wants to tell the browser what type of stream to capture but this capabilities just is for the browser to give the usurer interface of the app some idea of what the browser might be capable of. For the hints on what type of stream I agree with you. I just did not see it needed in the context of a the Capabilities. That seems more like what you would want when setting up a stream in the hints

> 
> Generally my experience is for person-to-person calls is frame rate (especially *consistently high* frame rate is more important than resolution.  I really, really want to see 25-30fps, and a steady rate, not one that dips every time someone talks with their hands, or adjusts their chair).  Now, different apps (and different users/use-cases) have different needs, so the main selectors I see are: minimum frame rate (request, not an absolute limit), favor resolution over frame rate or vice versa, and maybe maximum resolution.

Makes sense for hints - perhaps a max frame rate too given that one of my cameras does 72 fps and another does 120 fps and for talking heads style video conferencing, I'd prefer more resolution as long the frame rate was at least 30. 

> 
> Note that separate control of (encoded) bandwidth is required which feeds back through the mediastream to change capture parameters as needed while capturing!  This is not necessarily part of the JS api per se, but if you hook a mediastream up to an encoder/sink (like webrtc) that needs to be able to adjust parameters of the capturing device.  If that means the API needs to be defined here (and it may), then we need that.

I can imagine that the app would want to at least know that it had been changed. 

> 
>> 
>> For audio: I like the music and speech. I'm far more dubious about fm, studio, etc … I will note that I can not imagine any browser that does not have the capability for both music and speech if it does audio at all so I sort of wonder about the value of this as a capability. I do see it needed as a hint.
> 
> I agree.
> 
>> For video: I like action for temporal optimization and I think the we could call the spacial resolution "detail" or something like that. Again, not clear that the capability is here - this is more of hint.
>> 
>> I don't really get what face2face would be but I think it is worth be able to indicate that something is interactive media vs streaming media. I'd expect the browser to enable echo cancelation and such for interactive.
> 
> Where is this API/description being used?  If this is for GetUserMedia(), then the things you mention are higher-level constructs.
> 
> Another issue here is "can there be multiple audio or video tracks" in a MediaStream?
> 
>> 
>> The whole bandwidth thing I am confused on. Assume we even had any idea what "broadband" bandwidth was, I sort of doubt that browser would be able to reliably figure out if it had it or not before it was sending media.
> 
> bandwidth is more an issue (from capabilities) of the power of the encoder and decoder.  It's hard to specify that without knowledge of the codec used (and even then, it's more an issue of maximum resolution than maximum bandwidth, so I'm quite unclear on the meaning or utility of this.)
> 
> 
> 
> -- 
> Randell Jesup
> randell-ietf@jesup.org
> 
> 
>
Received on Tuesday, 24 January 2012 13:13:51 UTC