Requesting feedback for adding "concealedAudibleSamples" to getStats()

Hi,

I've written a PR (#215 <https://github.com/w3c/webrtc-stats/pull/215>) for
the Identifiers for WebRTC's Statistics API
<https://w3c.github.io/webrtc-stats/> which has gone stale.

The suggestion is to add a getStats()
<https://w3c.github.io/webrtc-pc/#dom-rtcpeerconnection-getstats()> metric
for audio tracks (RTCMediaStreamTrackStats). There already exists
totalSamplesReceived and concealedSamples
<https://w3c.github.io/webrtc-stats/#dom-rtcmediastreamtrackstats-concealedsamples>,
which is *"The total number of inbound audio samples that are concealed
samples. A concealed sample is a sample that is based on data that was
synthesized to conceal packet loss and does not represent incoming data."*

Concealment can occur due to packet loss when someone is speaking, or it
may occur to insert "silent" packets to the stream if packet loss occurs
when the stream is silent or background noise. T To differentiate, the
suggested new metric is concealedAudibleSamples:

*Only present for inbound audio tracks. The total number of concealed audio
samples (see concealedSamples) that was played out during an audible
portions of the stream. Audible means that the received audio is not
considered background noise or silence by the user agent. It is up to the
implementation to determine what is considered background noise, but
concealments of audible samples SHOULD in general have a greater impact on
user experience than concealment of non-audible samples. If the voice
activity flag is present in RTP packets as per [[RFC6464]] this MAY be used
to indicate audibility. Audibility MAY also be based on audio levels or
more sophisticated analysis of the stream.*


The problem with this metric is that there is no standards way to determine
what is or is not considered background noise so it would be
implementation-specific. This implies a risk if different browsers
implement it to mean different things. Still, the definition gives some
guidance to what it is supposed to mean and it would be useful when
analyizing call quality if concealment events occurred during "audible" or
"inaudible" portions of the stream, even if this involves some guesswork on
part of the implementation.

Does anyone have an opinion about this? Feel free to comment on the PR.

Cheers,
/Henrik (henbos on github)

Received on Tuesday, 5 September 2017 09:44:37 UTC