Re: Mozilla/Cisco API Proposal from Stefan Håkansson LK on 2011-07-18 (public-webrtc@w3.org from July 2011)

From: Stefan Håkansson LK <stefan.lk.hakansson@ericsson.com>
Date: Mon, 18 Jul 2011 16:39:35 +0200
To: Koen Vos <koen.vos@skype.net>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <4E2445A7.2080504@ericsson.com>

On 2011-07-14 01:54, Koen Vos wrote:
> Stefan Håkansson wrote:
>> More and more can be done by analysing the input signal (e.g.
>> determining if it is speech or music), so perhaps there will be no
>> need for API support.
>
> That may work in the long term.  But Opus currently has no speech/music detector, and I think it will take a while to build one that is good enough for most use cases.  So for now the API seems the only way we could set the Opus mode.
>
> What are you actually proposing: to hard code the Opus mode, or to quickly invent a reliable speech/music detector?
I'm proposing that we should not have "audio" or "voip" attributes in 
the API, at least not initially. When the initial version is in use, and 
actual use of it indicate that it would be really beneficial (in certain 
use cases), then we could add it - or perhaps technology catches up 
(maybe a good speech/music detector is available by that time, and it 
can be introduced with no alteration of the API).

Maybe practical use will indicate that the most important improvements 
of the API is something completely different.

I have zero knowledge of Opus so I will refrain from commenting on that 
part:)

Stefan
>
> best,
> koen.
>
>
> Stefan Håkansson wrote:
>
>>>> could help the codec perform at its optimum). And this set could be
>>>> irrelevant for a new generation of codecs. "audio" vs "voip" is just one
>>>> example, and it is specific for one codec. I think the general trend also is
>>>
>>> On the contrary, things like AGC and noise suppression are independent
>>> of _any_ codec (at least they are in the WebRTC stack Google
>>> open-sourced). Opus implements a few more things internally, but there's
>>> no reason in principle why those things couldn't be done outside the
>>> codec as well. The point is that this switch is the difference between,
>>> "Please actively distort the input I give you to make it sound
>>> 'better'," vs. "Please preserve the original input as closely as
>>> possible," and that semantic has little to do with the actual codec.
>>
>> I still think we should not go in this direction - at least not initially. Let's add it later if there is a clear need. More and more can be done by analysing the input>signal (e.g. determining if it is speech or music), so perhaps there will be no need for API support.
>>
>> Stefan

Received on Monday, 18 July 2011 14:39:59 UTC