Re: Mozilla/Cisco API Proposal from Cullen Jennings on 2011-07-12 (public-webrtc@w3.org from July 2011)

From: Cullen Jennings <fluffy@cisco.com>
Date: Tue, 12 Jul 2011 07:40:51 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: Ralph Giles <giles@thaumas.net>, Anant Narayanan <anant@mozilla.com>, public-webrtc@w3.org
Message-Id: <5A564C7D-A55D-4946-A422-32E99245C3CB@cisco.com>
On Jul 11, 2011, at 20:37 , Ian Hickson wrote:

> On Mon, 11 Jul 2011, Ralph Giles wrote:
>> On 11 July 2011 15:09, Ian Hickson <ian@hixie.ch> wrote:
>>> One of the differences is that your proposal allows the author to set 
>>> things like the quality of the audio. It's not clear to me what the 
>>> use case is for that. Can you elaborate on that?
>> 
>> Perhaps one example is the sort of thing described by 
>> MediaStreamTrackHints in the proposal. The Opus audio codec from the 
>> IETF standardization effort can switch between separate "voip" and 
>> "audio" coding modes. The script setting up the connection may have 
>> context information about which of these are more appropriate for its 
>> application. These are qualitative encoding choices, which significant 
>> overlap on any quality/bitrate scale.
>> 
>> Information like that is always going to be codec-specific, so an 
>> concrete way of passing ad hoc parameters to the user-agent would find 
>> use there.
> 
> To me that argues for a way to say "make sure this audio channel supports 
> DTMF" and for a way to generate DTMF tones. But I don't think it means we 
> need to expose codec information.

Just a side note, the best way to do DTMF (RFC 4733) is not as an audio tone but as a seperate coded for transering the data that keys werhe pressed. 

> 
> As a general rule, I think we should follow a model where authors give the 
> browser the constraints and let it work out the implications, rather than 
> a model where authors make the low-level decisions. This is because if we 
> have the authors making low-level decisions, the browser can't improve 
> matters later.
> 
> Consider for instance the situation where a Web page says it wants things 
> encoded using H.264 1280x768 60fps, non-interlaced, with an audio channel 
> that uses the Opus codec and so on. It works beautifully, the author is 
> happy, the users are happy.
> 
> Ten years later, browsers add support for a new codec which supports 
> seamless 3D video and 7.2 surround sound audio, and all computers are 
> shipping with lovely 3D webcams running at 8K resolutions that beat the 
> quality of today's Red cams, and ultraviolet laser microphones that can 
> build a perfect audio representation of the user's surroundings.
> 
> The aforementioned Web page gets none of this. It looks like it's ten 
> years old. To improve it, the author has to go and update all the codec 
> stuff to make it work.
> 
> But now consider what happens if instead of saying the codec, the author 
> had just said "give me video and give me audio". Ten years later, the 
> browsers can all just make 3D work and the site, unmodified, becomes 
> qualitatively better without the author having had to do any work at all.
> 
> This is why the Web uses a declarative high-level model. It lets us make 
> the Web better in the future.
> 
> 

So, 100% agree with your use case and that we want things to get the new stuff without any change to the web page. That said, I think it still important to have the possibility  to specify other constraints and intent. LEt me give a few examples. 

1) How you want to hand music vs spoken voice is totally different. For spoken voice, you want to filter out background noise, fan hum, etc while for music you typically want to capture all the nuances. 

2) Battery life. My iphone might be capable of doing awesome H.265 HD video but with a battery life of 45 minutes due to no hardware acceleration. However, with h.264 in the right mode it might get 10x that time. And if I was in an application where HD provided no value, I might want to use smaller resolution. So the browsers might always want to provide the "best" it can but "best" gets more complicated when trading off battery life. I used 264/265 in this example but you will have the same problem with VP8/VP9. Of course the default simple examples should make it so a user does not have to think about this. And the API should be designed such that people don't override with lame defaults. But I think advanced applications need to have some influence over the definition of "best". Note I'm not saying they should choose the coded and do all that - just they need to be able to provide the information that helps the right best get chosen. On a separate note, for debugging and comparison, I do think there needs to be a way for the JS to find out the exact details about what actually got chosen. 

3) Pages that do more than one thing. Imagine that I am doing a web page that has one Vida stream that shows the main display of some game or task I am doing. Perhaps the front facing view of driving a mobile robot. I also have several other video feeds of less importance. Perhaps view from other mobile robots of view of other angles. Say I want 75% of my bandwidth allocated to the main video stream and the reaming 25% split across the other windows. It would be nice to have a way to do something roughly like this. One thing that would be an absolutely awful experience as a web developer would be the first stream I add takes 100% of available bandwidth, then ret of streams get nothing. Equally bad would be all streams got exactly the same amount of bandwidth.
Received on Tuesday, 12 July 2011 14:41:20 UTC