Raw data API - 6 - Encoders/decoders from Peter Thatcher on 2018-06-13 (public-webrtc@w3.org from June 2018)

From: Peter Thatcher <pthatcher@google.com>
Date: Tue, 12 Jun 2018 23:59:36 -0700
To: WebRTC WG <public-webrtc@w3.org>
Message-ID: <CAJrXDUHs2f6txU-o3x+FS1LHE-ok_wiLWud8W_i5=BqqQCWKZg@mail.gmail.com>

Emails #3 and #4 of Harald's recent set of 5 emails covered how to get
encoded data in and out of RtpSender/RtpReceiver.  And could work fine if
you do the encode and decode in wasm/js.   But what if you want the browser
to handle the codecs, or provide hardware codecs?

There's one more piece to the puzzle: an API for encoders and decoders.  So
here is email #6 (which Harald asked me to write) describing how those
would look.


Basically an encoder is "track in; encoded frame out" and a decoder is
"encoded frame in; track out".  An encoded frame is the encoded bytes of
the pixels of a raw video frame at a particular point in time or the
encoded bytes of the samples of a raw audio "frame" over a range of time.

While the app doesn't feed the raw frames directly from the track to the
encoder  (or from the decoder to the track), it does have direct control
over how the encoder encodes and can change it at any time.

Here is how the objects could look:

interface AudioEncoder {
  // Can be called any time to change parameters
  void start(MediaStreamTrack rawAudio, AudioEncodeParameters
encodeParameters);
  void stop();
  attribute eventhandler onencodedaudio;  // of EncodedAudioFrame
}

dictionary AudioEncodeParameters {
  unsigned long frameLength;  // aka ptime, in ms
  unsigned long bitrate;
  // ...
}

dictionary EncodedAudioFrame {
  // Start timestamp in the samplerate clock
  unsigned long startSampleIndex;
  unsigned int sampleCount;
  unsigned int channelCount;
  CodecType codecType;
  ByteArray encodedData;
}

interface AudioDecoder {
  void decodeAudio(EncodedAudioFrame frame);
  readonly attribute MediaStreamTrack decodedAudio;
}

interface VideoEncoder {
  // Can be called any time to change parameters
  void start(MediaStreamTrack rawVideo, VideoEncodeParameters
encodeParameters);
  void stop();
  attribute eventhandler onencodedvideo;  // of EncodedVideoFrame
}

dictionary VideoEncodeParameters {
  unsigned long bitrate;
  boolean generateKeyFrame;
  // TODO: SVS/simulcast, resolutionScale, framerateScale, ...
  // ...
}

dictionary EncodedVideoFrame {
  unsigned short width;
  unsigned short height;
  unsigned short rotationDegrees;
  unsigned long timestampMs;
  CodecType codecType;
  ByteArray encodedData;
}

interface VideoDecoder {
  void decodeAudio(EncodedVideoFrame frame);
  readonly attribute MediaStreamTrack decodedVideo;
}


If you're paying attention, you may be wondering the following:

1.  Where is the jitter buffer?  Answer: it's in the decoder.  The decoder
can take out-of-order encoded frames and produce an in-order track.  This
is much more simple than exposing separate jitter buffer and decoder
objects.

2.  What about SVC/simulcast.  There are few ways we could go about it,
depending on what we want "encoder" and "encoded frame" to mean (1 layer or
many?).  I'm sure we'll cover that in the f2f.

Received on Wednesday, 13 June 2018 07:00:12 UTC