Opus Capabilities, Options and Settings (Issues 252, 258, 274, 277) from Bernard Aboba on 2015-11-21 (public-ortc@w3.org from November 2015)

From: Bernard Aboba <Bernard.Aboba@microsoft.com>
Date: Sat, 21 Nov 2015 01:24:34 +0000
To: "public-ortc@w3.org" <public-ortc@w3.org>
Message-ID: <BLUPR03MB1499F949493AB5859964FF3EC190@BLUPR03MB149.namprd03.prod.outlook.com>

As of the October 2015 Editor's draft, the ORTC API took a minimalist approach to Opus capabilities and settings, providing only two capabilities/settings (maxplaybackrate and stereo).

Given that RFC 7587<https://tools.ietf.org/html/rfc7587> Section 6.1 defines a number of optional SDP parameters (see below), the question has arisen as to what capabilities, settings and options are needed.

Some thoughts below. Comments welcome.

RFC 7587 Section 6.1:

Optional parameters:

maxplaybackrate: a hint about the maximum output sampling rate that

the receiver is capable of rendering in Hz. The decoder MUST be

capable of decoding any audio bandwidth, but, due to hardware

limitations, only signals up to the specified sampling rate can be

played back. Sending signals with higher audio bandwidth results

in higher than necessary network usage and encoding complexity, so

an encoder SHOULD NOT encode frequencies above the audio bandwidth

specified by maxplaybackrate. This parameter can take any value

between 8000 and 48000, although commonly the value will match one

of the Opus bandwidths (Table 1). By default, the receiver is

assumed to have no limitations, i.e., 48000.

[BA] In the October Editor's draft, maxplaybackrate was provided as an optional receiver capability as well as a potential sender setting (so as to keep within the receiver's capability). However, for the browser to be able to provide this receiver capability, it needs to be very aware of the capabilities of the underlying hardware.

sprop-maxcapturerate: a hint about the maximum input sampling rate

that the sender is likely to produce. This is not a guarantee

that the sender will never send any higher bandwidth (e.g., it

could send a prerecorded prompt that uses a higher bandwidth), but

it indicates to the receiver that frequencies above this maximum

can safely be discarded. This parameter is useful to avoid

wasting receiver resources by operating the audio processing

pipeline (e.g., echo cancellation) at a higher rate than

necessary. This parameter can take any value between 8000 and

48000, although commonly the value will match one of the Opus

bandwidths (Table 1). By default, the sender is assumed to have

no limitations, i.e., 48000.

[BA] This was not included in the October editor's draft. Is this a sender capability that the browser can easily determine?

maxptime: the maximum duration of media represented by a packet

(according to Section 6 of [RFC4566]<https://tools.ietf.org/html/rfc4566#section-6>) that a decoder wants to

receive, in milliseconds rounded up to the next full integer

value. Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

multiple of an Opus frame size rounded up to the next full integer

value, up to a maximum value of 120, as defined in Section 4<https://tools.ietf.org/html/rfc7587#section-4>. If

no value is specified, the default is 120.

[BA] This is already available in RTCRtpCodecCapability/RTCRtpCodecParameters, so it is not needed as an Opus capability or setting.

ptime: the preferred duration of media represented by a packet

(according to Section 6 of [RFC4566]<https://tools.ietf.org/html/rfc4566#section-6>) that a decoder wants to

receive, in milliseconds rounded up to the next full integer

value. Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

multiple of an Opus frame size rounded up to the next full integer

value, up to a maximum value of 120, as defined in Section 4<https://tools.ietf.org/html/rfc7587#section-4>. If

no value is specified, the default is 20.

[BA] Previously, I believe we discussed the lack of a need for ptime support in RTCRtpCodecCapability/RTCRtpCodecParameters. So do we need this as an Opus capability or setting?

maxaveragebitrate: specifies the maximum average receive bitrate of

a session in bits per second (bit/s). The actual value of the

bitrate can vary, as it is dependent on the characteristics of the

media in a packet. Note that the maximum average bitrate MAY be

modified dynamically during a session. Any positive integer is

allowed, but values outside the range 6000 to 510000 SHOULD be

ignored. If no value is specified, the maximum value specified in

Section 3.1.1<https://tools.ietf.org/html/rfc7587#section-3.1.1> for the corresponding mode of Opus and corresponding

maxplaybackrate is the default.

[BA] We currently have maxBitrate in RTCRtpEncodingParameters. So do we need this as an Opus capability or setting?

stereo: specifies whether the decoder prefers receiving stereo or

mono signals. Possible values are 1 and 0, where 1 specifies that

stereo signals are preferred, and 0 specifies that only mono

signals are preferred. Independent of the stereo parameter, every

receiver MUST be able to receive and decode stereo signals, but

sending stereo signals to a receiver that signaled a preference

for mono signals may result in higher than necessary network

utilization and encoding complexity. If no value is specified,

the default is 0 (mono).

[BA] In the October Editor's draft, stereo was provided as an optional receiver capability as well as a potential sender setting. However, we already have numChannels

in RTCRtpCodecCapability/RTCRtpCodecParameters, so it is possible to determine if a codec decoder/encoder supports stereo and to indicate if this is enabled or not.

sprop-stereo: specifies whether the sender is likely to produce

stereo audio. Possible values are 1 and 0, where 1 specifies that

stereo signals are likely to be sent, and 0 specifies that the

sender will likely only send mono. This is not a guarantee that

the sender will never send stereo audio (e.g., it could send a

prerecorded prompt that uses stereo), but it indicates to the

receiver that the received signal can be safely downmixed to mono.

This parameter is useful to avoid wasting receiver resources by

operating the audio processing pipeline (e.g., echo cancellation)

in stereo when not necessary. If no value is specified, the

default is 0 (mono).

[BA] Whether a sender is likely to produce stereo depends on the application as well as the underlying capability, so not clear to me this makes sense as a distinct capability or setting.

cbr: specifies if the decoder prefers the use of a constant bitrate

versus a variable bitrate. Possible values are 1 and 0, where 1

specifies constant bitrate, and 0 specifies variable bitrate. If

no value is specified, the default is 0 (vbr). When cbr is 1, the

maximum average bitrate can still change, e.g., to adapt to

changing network conditions.

[BA] Not clear what value this adds as a capability or setting.

useinbandfec: specifies that the decoder has the capability to take

advantage of the Opus in-band FEC. Possible values are 1 and 0.

Providing 0 when FEC cannot be used on the receiving side is

RECOMMENDED. If no value is specified, useinbandfec is assumed to

be 0. This parameter is only a preference, and the receiver MUST

be able to process packets that include FEC information, even if

it means the FEC part is discarded.

[BA] Some implementations (e.g. Edge ORTC) may not support inband FEC, so it may make sense to have this as a receiver/sender capability (whether it is supported) and sender setting (whether FEC should be generated).

usedtx: specifies if the decoder prefers the use of DTX. Possible

values are 1 and 0. If no value is specified, the default is 0.

[BA] Some implementations (e.g. Edge ORTC) may not support DTX, so it may make sense to have this as a receiver/sender capability (whether it is supported) and sender setting (whether DTX should be enabled).

Received on Saturday, 21 November 2015 01:25:11 UTC