Re: [Bug 18485] Change DTMF API to be on PeerConnection from Randell Jesup on 2012-08-15 (public-webrtc@w3.org from August 2012)

From: Randell Jesup <randell-ietf@jesup.org>
Date: Wed, 15 Aug 2012 17:20:45 -0400
To: public-webrtc@w3.org
Message-ID: <502C12AD.8070505@jesup.org>
On 8/15/2012 12:03 PM, Martin Thomson wrote:
> On 15 August 2012 01:38,  <bugzilla@jessica.w3.org> wrote:
>> --- Comment #5 from Harald Alvestrand <harald@alvestrand.no> 2012-08-15 08:38:15 UTC ---
>> Based on further discussion, a need has been identified to play back tones in
>> synchronity with the DTMF signals going out.
> I've been following this, but I didn't reach that conclusion.
>
> Providing audio feedback to a user when a button is pressed is
> commonplace and evidently useful for some applications.  (Personally,
> I find it irritating - the first customization I make on any PC is to
> turn of all forms of audible feedback.)
>
> That doesn't immediately mean that adding audible feedback to another
> media stream is an obvious conclusion.  The only argument presented
> for synchronization was to make echo cancellation easier to implement.

It isn't that echo cancellation is easier to implement really - it's 
that if the app is responsible for it, you can't really guarantee that 
the tone won't leak into the outgoing stream, even if echo cancellation 
is in the OS driver.  (If the app makes the tone short, and slightly 
delayed, it will likely fall within the "don't listen to the mic" 
timeframe - but this is rather clownshoes for the app to have to work 
around this for a common case.)

If the app stays away from freqs near DTMF tones, it might not be a big 
problem, though the many people like to make them generate "correct" 
DTMF tones because some people use them as feedback they typed numbers 
correctly on a keypad.

Also, if you want to implement (at some point in the future) a 
start/stop setup (for long presses), it may be good to have this API.

All that said: not a huge deal, and only of any import for interacting 
with IVR systems.  (For example, some security IVR systems have you call 
in, and you can control which camera and camera positions by keypad - 
press-and-hold to pan for example. Edge usecase, but real.)     The main 
case for worrying about this is normal PSTN gateway interop calling 
normal voice IVRs like Dr's office, Expedia, etc, etc, etc.

If we punt on it (and we can) we'll need to strongly suggest they stay 
well away from DTMF tones and give suggestions for duration and timing  
(I've even seen user speech recognized as DTMF at times.) And if they 
don't stay away from DTMF, it may fail (extra presses) in some 
situations randomly.  And we'll need to standardize (or give strong 
recommendations) on tone times to webrtc implementers.


>   Given that this is only one potential source of sounds that could
> cause echo, I'm not sure how this actually makes it easier anyhow.
>
> If we wanted synchronization with an incoming stream, wouldn't that be
> based on when the remote peer heard the tones and not when we sent
> them?  That's the only other reason for that level of synchronization
> I can imagine (and it's an awful reason if you start thinking about
> corner cases).
>
> --Martin

-- 
Randell Jesup
randell-ietf@jesup.org
Received on Wednesday, 15 August 2012 21:21:21 UTC