Re: Specifying the audio buffer size from Justin Uberti on 2015-04-22 (public-media-capture@w3.org from April 2015)

From: Justin Uberti <juberti@google.com>
Date: Wed, 22 Apr 2015 16:57:39 -0700
To: Harald Alvestrand <harald@alvestrand.no>
Cc: "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <CAOJ7v-2XQybcJfL5jqnZyuepNYDzMjXkmTyUyzdD4htav3xLEg@mail.gmail.com>

On Tue, Apr 21, 2015 at 4:42 AM, Harald Alvestrand <harald@alvestrand.no>
wrote:

> Den 21. april 2015 02:32, skrev Charlie Kehoe:
> > Some applications involve listening to audio for a potentially extended
> > period of time (with user consent, of course), and are not particularly
> > latency-sensitive. An example would be the "Ok Google" hotwording
> > available on the Chrome new tab page, or other types of continuous
> > speech recognition. For these applications, a typical low-latency audio
> > configuration can lead to excessive power usage. I've measured 20% CPU
> > usage for audio capture in Chrome, for example.
> >
> > My proposed solution is to offer a way to change the audio buffer size.
> > This enables a tradeoff between latency and power usage. For example, a
> > member could be added to MediaTrackConstraintSet
> > <
> http://w3c.github.io/mediacapture-main/getusermedia.html#dictionary-mediatrackconstraintset-members
> >:
> >
> > dictionary MediaTrackConstraintSet {
> >    ...
> >    audioBufferDurationMs of type ConstrainLong
> > };
> >
> > This would be an integer number of milliseconds. Perhaps the name could
> > mention latency instead (e.g. audioLatencyMs).
> >
> > How does this simple change sound?
>
> I'd prefer to actually look at where the thing is connected, and do the
> configuration there.
>
> If it goes to a MediaStreamRecorder, that already has all the
> information needed (chunk size).
> If it goes to a PeerConnection, buffering may belong there, but I'm not
> sure how to represent it, or where it makes sense
> (.permissibleBufferDelay on an RTPSender? Perhaps..)
>

One could argue that this would also apply for things like resolution or
sample rate, yet we allow those things to be specified as inputs to gUM.

Part of the reason for this is that you have the track before you wire it
up to something, which means you only learn about the downstream needs at
some point in the future, so you may have to go reopen the device.

>
> If it's only which code path is chosen in the Google Chrome browser, I'd
> prefer a constraint like "googLowLatencyPath = false"; this is an
> implementation concern, not an architectural concern.
>

I don't think this is a code path question; it's a generic question of how
often we should be reading data from the device, which applications could
have vastly different, non-binary opinions on. An application that wants to
do live music performance might choose to read the device every 2.5 ms, and
send as 2.5 ms packetized Opus, whereas an application that is passively
listening for commands might want to read the device 10x less often.

Received on Wednesday, 22 April 2015 23:58:28 UTC