- From: Peter Thatcher <pthatcher@google.com>
- Date: Tue, 12 Jun 2018 23:59:36 -0700
- To: WebRTC WG <public-webrtc@w3.org>
- Message-ID: <CAJrXDUHs2f6txU-o3x+FS1LHE-ok_wiLWud8W_i5=BqqQCWKZg@mail.gmail.com>
Emails #3 and #4 of Harald's recent set of 5 emails covered how to get encoded data in and out of RtpSender/RtpReceiver. And could work fine if you do the encode and decode in wasm/js. But what if you want the browser to handle the codecs, or provide hardware codecs? There's one more piece to the puzzle: an API for encoders and decoders. So here is email #6 (which Harald asked me to write) describing how those would look. Basically an encoder is "track in; encoded frame out" and a decoder is "encoded frame in; track out". An encoded frame is the encoded bytes of the pixels of a raw video frame at a particular point in time or the encoded bytes of the samples of a raw audio "frame" over a range of time. While the app doesn't feed the raw frames directly from the track to the encoder (or from the decoder to the track), it does have direct control over how the encoder encodes and can change it at any time. Here is how the objects could look: interface AudioEncoder { // Can be called any time to change parameters void start(MediaStreamTrack rawAudio, AudioEncodeParameters encodeParameters); void stop(); attribute eventhandler onencodedaudio; // of EncodedAudioFrame } dictionary AudioEncodeParameters { unsigned long frameLength; // aka ptime, in ms unsigned long bitrate; // ... } dictionary EncodedAudioFrame { // Start timestamp in the samplerate clock unsigned long startSampleIndex; unsigned int sampleCount; unsigned int channelCount; CodecType codecType; ByteArray encodedData; } interface AudioDecoder { void decodeAudio(EncodedAudioFrame frame); readonly attribute MediaStreamTrack decodedAudio; } interface VideoEncoder { // Can be called any time to change parameters void start(MediaStreamTrack rawVideo, VideoEncodeParameters encodeParameters); void stop(); attribute eventhandler onencodedvideo; // of EncodedVideoFrame } dictionary VideoEncodeParameters { unsigned long bitrate; boolean generateKeyFrame; // TODO: SVS/simulcast, resolutionScale, framerateScale, ... // ... } dictionary EncodedVideoFrame { unsigned short width; unsigned short height; unsigned short rotationDegrees; unsigned long timestampMs; CodecType codecType; ByteArray encodedData; } interface VideoDecoder { void decodeAudio(EncodedVideoFrame frame); readonly attribute MediaStreamTrack decodedVideo; } If you're paying attention, you may be wondering the following: 1. Where is the jitter buffer? Answer: it's in the decoder. The decoder can take out-of-order encoded frames and produce an in-order track. This is much more simple than exposing separate jitter buffer and decoder objects. 2. What about SVC/simulcast. There are few ways we could go about it, depending on what we want "encoder" and "encoded frame" to mean (1 layer or many?). I'm sure we'll cover that in the f2f.
Received on Wednesday, 13 June 2018 07:00:12 UTC