Re: Specifying the audio buffer size from Harald Alvestrand on 2015-05-11 (public-media-capture@w3.org from May 2015)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Mon, 11 May 2015 12:22:32 +0200
To: Charlie Kehoe <ckehoe@google.com>, Justin Uberti <juberti@google.com>
CC: "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <555082E8.1040407@alvestrand.no>
Den 05. mai 2015 00:00, skrev Charlie Kehoe:
> Any additional thoughts here? The May 15th deadline is not too far away.

My feeling based on the discussion is that "buffer size" is still the
wrong number, because it doesn't describe what the user's constraint is.
"maxMediaDelay" may be more appropriate - "I can tolerate this many
milliseconds of delay with no impairment to my service" might be a good
description of semantics.

For systems that deliver low latency in their only available code path,
they would continue to deliver low latency.

At the other extreme, apps that want low latency could request that
(using ideal) or require that (using max), and get appropriate behavior.

One good question is what part of the system we measure delay across,
though.


> 
> 
> On Wed, Apr 22, 2015 at 4:59 PM Justin Uberti <juberti@google.com
> <mailto:juberti@google.com>> wrote:
> 
> 
> 
>     On Tue, Apr 21, 2015 at 4:42 AM, Harald Alvestrand
>     <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote:
> 
>         Den 21. april 2015 02:32, skrev Charlie Kehoe:
>         > Some applications involve listening to audio for a potentially extended
>         > period of time (with user consent, of course), and are not particularly
>         > latency-sensitive. An example would be the "Ok Google" hotwording
>         > available on the Chrome new tab page, or other types of continuous
>         > speech recognition. For these applications, a typical low-latency audio
>         > configuration can lead to excessive power usage. I've measured 20% CPU
>         > usage for audio capture in Chrome, for example.
>         >
>         > My proposed solution is to offer a way to change the audio buffer size.
>         > This enables a tradeoff between latency and power usage. For example, a
>         > member could be added to MediaTrackConstraintSet
> 
>         >
>         <http://w3c.github.io/mediacapture-main/getusermedia.html#dictionary-mediatrackconstraintset-members>:
> 
> 
>         >
>         > dictionary MediaTrackConstraintSet {
>         >    ...
>         >    audioBufferDurationMs of type ConstrainLong
>         > };
>         >
>         > This would be an integer number of milliseconds. Perhaps the name could
>         > mention latency instead (e.g. audioLatencyMs).
>         >
>         > How does this simple change sound?
> 
>         I'd prefer to actually look at where the thing is connected, and
>         do the
>         configuration there.
> 
>         If it goes to a MediaStreamRecorder, that already has all the
>         information needed (chunk size).
>         If it goes to a PeerConnection, buffering may belong there, but
>         I'm not
>         sure how to represent it, or where it makes sense
>         (.permissibleBufferDelay on an RTPSender? Perhaps..)
> 
> 
>     One could argue that this would also apply for things like
>     resolution or sample rate, yet we allow those things to be specified
>     as inputs to gUM.
> 
>     Part of the reason for this is that you have the track before you
>     wire it up to something, which means you only learn about the
>     downstream needs at some point in the future, so you may have to go
>     reopen the device.
>      
> 
> 
>         If it's only which code path is chosen in the Google Chrome
>         browser, I'd
>         prefer a constraint like "googLowLatencyPath = false"; this is an
>         implementation concern, not an architectural concern.
> 
> 
>     I don't think this is a code path question; it's a generic question
>     of how often we should be reading data from the device, which
>     applications could have vastly different, non-binary opinions on. An
>     application that wants to do live music performance might choose to
>     read the device every 2.5 ms, and send as 2.5 ms packetized Opus,
>     whereas an application that is passively listening for commands
>     might want to read the device 10x less often.
>
Received on Monday, 11 May 2015 10:23:05 UTC