Opus Capabilities, Options and Settings (Issues 252, 258, 274, 277)

As of the October 2015 Editor's draft, the ORTC API took a minimalist approach to Opus capabilities and settings, providing only two capabilities/settings (maxplaybackrate and stereo).

Given that RFC 7587<https://tools.ietf.org/html/rfc7587> Section 6.1 defines a number of optional SDP parameters (see below), the question has arisen as to what capabilities, settings and options are needed.

Some thoughts below.  Comments welcome.

RFC 7587 Section 6.1:


   Optional parameters:



   maxplaybackrate:  a hint about the maximum output sampling rate that

      the receiver is capable of rendering in Hz.  The decoder MUST be

      capable of decoding any audio bandwidth, but, due to hardware

      limitations, only signals up to the specified sampling rate can be

      played back.  Sending signals with higher audio bandwidth results

      in higher than necessary network usage and encoding complexity, so

      an encoder SHOULD NOT encode frequencies above the audio bandwidth

      specified by maxplaybackrate.  This parameter can take any value

      between 8000 and 48000, although commonly the value will match one

      of the Opus bandwidths (Table 1).  By default, the receiver is

      assumed to have no limitations, i.e., 48000.



[BA] In the October Editor's draft, maxplaybackrate was provided as an optional receiver capability as well as a potential sender setting (so as to keep within the receiver's capability).  However, for the browser to be able to provide this receiver capability, it needs to be very aware of the capabilities of the underlying hardware.



   sprop-maxcapturerate:  a hint about the maximum input sampling rate

      that the sender is likely to produce.  This is not a guarantee

      that the sender will never send any higher bandwidth (e.g., it

      could send a prerecorded prompt that uses a higher bandwidth), but

      it indicates to the receiver that frequencies above this maximum

      can safely be discarded.  This parameter is useful to avoid

      wasting receiver resources by operating the audio processing

      pipeline (e.g., echo cancellation) at a higher rate than

      necessary.  This parameter can take any value between 8000 and

      48000, although commonly the value will match one of the Opus

      bandwidths (Table 1).  By default, the sender is assumed to have

      no limitations, i.e., 48000.



[BA] This was not included in the October editor's draft. Is this a sender capability that the browser can easily determine?





   maxptime:  the maximum duration of media represented by a packet

      (according to Section 6 of [RFC4566]<https://tools.ietf.org/html/rfc4566#section-6>) that a decoder wants to

      receive, in milliseconds rounded up to the next full integer

      value.  Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

      multiple of an Opus frame size rounded up to the next full integer

      value, up to a maximum value of 120, as defined in Section 4<https://tools.ietf.org/html/rfc7587#section-4>.  If

      no value is specified, the default is 120.



[BA] This is already available in RTCRtpCodecCapability/RTCRtpCodecParameters, so it is not needed as an Opus capability or setting.



   ptime:  the preferred duration of media represented by a packet

      (according to Section 6 of [RFC4566]<https://tools.ietf.org/html/rfc4566#section-6>) that a decoder wants to

      receive, in milliseconds rounded up to the next full integer

      value.  Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

      multiple of an Opus frame size rounded up to the next full integer

      value, up to a maximum value of 120, as defined in Section 4<https://tools.ietf.org/html/rfc7587#section-4>.  If

      no value is specified, the default is 20.



[BA] Previously, I believe we discussed the lack of a need for ptime support in RTCRtpCodecCapability/RTCRtpCodecParameters. So do we need this as an Opus capability or setting?





   maxaveragebitrate:  specifies the maximum average receive bitrate of

      a session in bits per second (bit/s).  The actual value of the

      bitrate can vary, as it is dependent on the characteristics of the

      media in a packet.  Note that the maximum average bitrate MAY be

      modified dynamically during a session.  Any positive integer is

      allowed, but values outside the range 6000 to 510000 SHOULD be

      ignored.  If no value is specified, the maximum value specified in

      Section 3.1.1<https://tools.ietf.org/html/rfc7587#section-3.1.1> for the corresponding mode of Opus and corresponding

      maxplaybackrate is the default.



[BA] We currently have maxBitrate in RTCRtpEncodingParameters.  So do we need this as an Opus capability or setting?



   stereo:  specifies whether the decoder prefers receiving stereo or

      mono signals.  Possible values are 1 and 0, where 1 specifies that

      stereo signals are preferred, and 0 specifies that only mono

      signals are preferred.  Independent of the stereo parameter, every

      receiver MUST be able to receive and decode stereo signals, but

      sending stereo signals to a receiver that signaled a preference

      for mono signals may result in higher than necessary network

      utilization and encoding complexity.  If no value is specified,

      the default is 0 (mono).



[BA] In the October Editor's draft, stereo was provided as an optional receiver capability as well as a potential sender setting.  However, we already have numChannels

in RTCRtpCodecCapability/RTCRtpCodecParameters, so it is possible to determine if a codec decoder/encoder supports stereo and to indicate if this is enabled or not.



   sprop-stereo:  specifies whether the sender is likely to produce

      stereo audio.  Possible values are 1 and 0, where 1 specifies that

      stereo signals are likely to be sent, and 0 specifies that the

      sender will likely only send mono.  This is not a guarantee that

      the sender will never send stereo audio (e.g., it could send a

      prerecorded prompt that uses stereo), but it indicates to the

      receiver that the received signal can be safely downmixed to mono.

      This parameter is useful to avoid wasting receiver resources by

      operating the audio processing pipeline (e.g., echo cancellation)

      in stereo when not necessary.  If no value is specified, the

      default is 0 (mono).



[BA] Whether a sender is likely to produce stereo depends on the application as well as the underlying capability, so not clear to me this makes sense as a distinct capability or setting.





   cbr:  specifies if the decoder prefers the use of a constant bitrate

      versus a variable bitrate.  Possible values are 1 and 0, where 1

      specifies constant bitrate, and 0 specifies variable bitrate.  If

      no value is specified, the default is 0 (vbr).  When cbr is 1, the

      maximum average bitrate can still change, e.g., to adapt to

      changing network conditions.



[BA] Not clear what value this adds as a capability or setting.





   useinbandfec:  specifies that the decoder has the capability to take

      advantage of the Opus in-band FEC.  Possible values are 1 and 0.

      Providing 0 when FEC cannot be used on the receiving side is

      RECOMMENDED.  If no value is specified, useinbandfec is assumed to

      be 0.  This parameter is only a preference, and the receiver MUST

      be able to process packets that include FEC information, even if

      it means the FEC part is discarded.



[BA] Some implementations (e.g. Edge ORTC) may not support inband FEC, so it may make sense to have this as a receiver/sender capability (whether it is supported) and sender setting (whether FEC should be generated).





   usedtx:  specifies if the decoder prefers the use of DTX.  Possible

      values are 1 and 0.  If no value is specified, the default is 0.



[BA] Some implementations (e.g. Edge ORTC) may not support DTX, so it may make sense to have this as a receiver/sender capability (whether it is supported) and sender setting (whether DTX should be enabled).

Received on Saturday, 21 November 2015 01:25:11 UTC