- From: Peter Thatcher <pthatcher@google.com>
- Date: Wed, 23 May 2018 09:34:40 -0700
- To: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
- Cc: public-webrtc@w3.org
- Message-ID: <CAJrXDUGWgBZmLA4Hrvbd_Q6nQawDVb0K3S7=uZ-Ky4CMKeY=YA@mail.gmail.com>
On Wed, May 23, 2018 at 4:04 AM Sergio Garcia Murillo < sergio.garcia.murillo@gmail.com> wrote: > On 22/05/2018 23:21, Peter Thatcher wrote: > > I made a diagram of my thinking of the different possible API layers. > Here it is: > > [image: RTP object stack.png] > > > IMHO jitter buffer should not be coupled with the decoder as it is an RTP > thing, > The jitter buffer and decoder are not RTP things. They can work with any transport (they can work with media over QUIC or SCTP, for example). Although, I should be clear, I'm talking about the *frame* jitter buffer, not the *packet* buffer. For audio, there is no distinction between the two. But for video, there are different parts. I include the packet buffer in the (de)packetizer, or the RtpFrameTransport, not the frame buffer. The RTP packet buffer is, indeed, RTP specific. But the frame buffer isn't. > if you are going to feed full frames to the decoder, it has to be on the > RTPFrameTransport (why "frame" ?), or better, split it in two different > components. > The packet buffer, yes. The frame buffer, no. It's better that the frames coming out of the frame transport are out of order and the code for dealing with out of order frames is common across transports and more united with the decoding, not more united with the transport. For video, it's fine if there are two different components (a frame buffer and a decoder). In fact, I'd be in favor of that if we can make sure the performance is good and it doesn't add too much extra complexity to the API. However, it's not that simple for audio. Doing so is theoretically possible for audio, but the current implementation of the audio jitter buffer by all but 1 WebRTC-capable browser (NetEq) has the jitter buffer and decoder tightly coupled. Decoupling those would be a very large amount of work, assuming it's possible while retaining the same level of performance/quality. Plus, it's nice conceptual model: you put in out-of-order encoded frames and you get out in-order decoded frames. As for the name "frame", that's what it's called everywhere in the WebRTC implementation and how everyone talks about it. Do you have a suggestion for a better name for a collection of pixels or a chunk of audio samples? > > > This highlights the question: what happens to RtpParameters? Some are > per-transport, some are per-stream, some are per-encoder (which is > different than per-stream), and some don't make sense any more. > > Per-transport: > - Payload type <-> codec mapping > - RTCP reduced size > - Header extension ID <-> URI mapping > > Per-stream: > - PTs used for PT-based demux > - priority > - mid > - cname > - ssrcs > > How about simulcast and rids? > Yeah, simulcast/SVC is a rathole of a topic I was trying to avoid dragging into this email thread. > Per-encoder: > - dtx > - ptime > - codec parameters > - bitrate > - maxFramerate > - scaleResolutionDownBy > > Don't make sense: > - active (just stop the encoder) > - degradation preference (just change the encoder directly) > > Not sure if I want to get rid of the degradation preference, as currently > the encoder is affected by both bandwidth reported by the transport layer + > cpu utilization. > Yeah, I could see a degradation preference on the encoder. Good point. > Would it make sense to extend the degradation preference to also cover > that scenario? Also, I am not sure if i want to get rid of a backchannel > between the encoder and the transport and have to do it manually. > This is something that WG can discuss. I think the app has to be able to control it because the rules for bitrate allocation between audio, multiple streams of video, simulcast, and SVC just get too complex and we should let the app decide directly how to allocate bits. However, I do see your point that easy things should be easy, so perhaps there's an automatic default behavior that works well. On the other hand, I'm in favor of low-level APIs with libraries on top of that. Wiring a BWE value into an encoder "manually" is not very hard. > > Best regards > > sergio >
Attachments
- image/png attachment: RTP_object_stack.png
Received on Wednesday, 23 May 2018 16:35:23 UTC