- From: Bernard Aboba <Bernard.Aboba@microsoft.com>
- Date: Sat, 21 Nov 2015 01:24:34 +0000
- To: "public-ortc@w3.org" <public-ortc@w3.org>
- Message-ID: <BLUPR03MB1499F949493AB5859964FF3EC190@BLUPR03MB149.namprd03.prod.outlook.com>
As of the October 2015 Editor's draft, the ORTC API took a minimalist approach to Opus capabilities and settings, providing only two capabilities/settings (maxplaybackrate and stereo).
Given that RFC 7587<https://tools.ietf.org/html/rfc7587> Section 6.1 defines a number of optional SDP parameters (see below), the question has arisen as to what capabilities, settings and options are needed.
Some thoughts below. Comments welcome.
RFC 7587 Section 6.1:
Optional parameters:
maxplaybackrate: a hint about the maximum output sampling rate that
the receiver is capable of rendering in Hz. The decoder MUST be
capable of decoding any audio bandwidth, but, due to hardware
limitations, only signals up to the specified sampling rate can be
played back. Sending signals with higher audio bandwidth results
in higher than necessary network usage and encoding complexity, so
an encoder SHOULD NOT encode frequencies above the audio bandwidth
specified by maxplaybackrate. This parameter can take any value
between 8000 and 48000, although commonly the value will match one
of the Opus bandwidths (Table 1). By default, the receiver is
assumed to have no limitations, i.e., 48000.
[BA] In the October Editor's draft, maxplaybackrate was provided as an optional receiver capability as well as a potential sender setting (so as to keep within the receiver's capability). However, for the browser to be able to provide this receiver capability, it needs to be very aware of the capabilities of the underlying hardware.
sprop-maxcapturerate: a hint about the maximum input sampling rate
that the sender is likely to produce. This is not a guarantee
that the sender will never send any higher bandwidth (e.g., it
could send a prerecorded prompt that uses a higher bandwidth), but
it indicates to the receiver that frequencies above this maximum
can safely be discarded. This parameter is useful to avoid
wasting receiver resources by operating the audio processing
pipeline (e.g., echo cancellation) at a higher rate than
necessary. This parameter can take any value between 8000 and
48000, although commonly the value will match one of the Opus
bandwidths (Table 1). By default, the sender is assumed to have
no limitations, i.e., 48000.
[BA] This was not included in the October editor's draft. Is this a sender capability that the browser can easily determine?
maxptime: the maximum duration of media represented by a packet
(according to Section 6 of [RFC4566]<https://tools.ietf.org/html/rfc4566#section-6>) that a decoder wants to
receive, in milliseconds rounded up to the next full integer
value. Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary
multiple of an Opus frame size rounded up to the next full integer
value, up to a maximum value of 120, as defined in Section 4<https://tools.ietf.org/html/rfc7587#section-4>. If
no value is specified, the default is 120.
[BA] This is already available in RTCRtpCodecCapability/RTCRtpCodecParameters, so it is not needed as an Opus capability or setting.
ptime: the preferred duration of media represented by a packet
(according to Section 6 of [RFC4566]<https://tools.ietf.org/html/rfc4566#section-6>) that a decoder wants to
receive, in milliseconds rounded up to the next full integer
value. Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary
multiple of an Opus frame size rounded up to the next full integer
value, up to a maximum value of 120, as defined in Section 4<https://tools.ietf.org/html/rfc7587#section-4>. If
no value is specified, the default is 20.
[BA] Previously, I believe we discussed the lack of a need for ptime support in RTCRtpCodecCapability/RTCRtpCodecParameters. So do we need this as an Opus capability or setting?
maxaveragebitrate: specifies the maximum average receive bitrate of
a session in bits per second (bit/s). The actual value of the
bitrate can vary, as it is dependent on the characteristics of the
media in a packet. Note that the maximum average bitrate MAY be
modified dynamically during a session. Any positive integer is
allowed, but values outside the range 6000 to 510000 SHOULD be
ignored. If no value is specified, the maximum value specified in
Section 3.1.1<https://tools.ietf.org/html/rfc7587#section-3.1.1> for the corresponding mode of Opus and corresponding
maxplaybackrate is the default.
[BA] We currently have maxBitrate in RTCRtpEncodingParameters. So do we need this as an Opus capability or setting?
stereo: specifies whether the decoder prefers receiving stereo or
mono signals. Possible values are 1 and 0, where 1 specifies that
stereo signals are preferred, and 0 specifies that only mono
signals are preferred. Independent of the stereo parameter, every
receiver MUST be able to receive and decode stereo signals, but
sending stereo signals to a receiver that signaled a preference
for mono signals may result in higher than necessary network
utilization and encoding complexity. If no value is specified,
the default is 0 (mono).
[BA] In the October Editor's draft, stereo was provided as an optional receiver capability as well as a potential sender setting. However, we already have numChannels
in RTCRtpCodecCapability/RTCRtpCodecParameters, so it is possible to determine if a codec decoder/encoder supports stereo and to indicate if this is enabled or not.
sprop-stereo: specifies whether the sender is likely to produce
stereo audio. Possible values are 1 and 0, where 1 specifies that
stereo signals are likely to be sent, and 0 specifies that the
sender will likely only send mono. This is not a guarantee that
the sender will never send stereo audio (e.g., it could send a
prerecorded prompt that uses stereo), but it indicates to the
receiver that the received signal can be safely downmixed to mono.
This parameter is useful to avoid wasting receiver resources by
operating the audio processing pipeline (e.g., echo cancellation)
in stereo when not necessary. If no value is specified, the
default is 0 (mono).
[BA] Whether a sender is likely to produce stereo depends on the application as well as the underlying capability, so not clear to me this makes sense as a distinct capability or setting.
cbr: specifies if the decoder prefers the use of a constant bitrate
versus a variable bitrate. Possible values are 1 and 0, where 1
specifies constant bitrate, and 0 specifies variable bitrate. If
no value is specified, the default is 0 (vbr). When cbr is 1, the
maximum average bitrate can still change, e.g., to adapt to
changing network conditions.
[BA] Not clear what value this adds as a capability or setting.
useinbandfec: specifies that the decoder has the capability to take
advantage of the Opus in-band FEC. Possible values are 1 and 0.
Providing 0 when FEC cannot be used on the receiving side is
RECOMMENDED. If no value is specified, useinbandfec is assumed to
be 0. This parameter is only a preference, and the receiver MUST
be able to process packets that include FEC information, even if
it means the FEC part is discarded.
[BA] Some implementations (e.g. Edge ORTC) may not support inband FEC, so it may make sense to have this as a receiver/sender capability (whether it is supported) and sender setting (whether FEC should be generated).
usedtx: specifies if the decoder prefers the use of DTX. Possible
values are 1 and 0. If no value is specified, the default is 0.
[BA] Some implementations (e.g. Edge ORTC) may not support DTX, so it may make sense to have this as a receiver/sender capability (whether it is supported) and sender setting (whether DTX should be enabled).
Received on Saturday, 21 November 2015 01:25:11 UTC