Re: Use cases / requirements for raw data access functions

IMHO we (me first) are not providing use cases, but 
features/functionalities we like to bring into the API, which is fine, 
but I think we are overlooking one fact: QUIC and RTP have one 
fundamental difference, udp fragmentation/packetization.

While in QUIC the packetization is codec agnostic and performed deep 
into the stack, in RTP, the packetization is codec dependent and done 
before reaching the rtp stack.

Why I am bringing that topic in? Because I feel that some use 
cases/features, for example raw access to encoded frames or the bring 
your own crypto makes a lot of sense for QUIC (where you just need to 
pass the raw binary data as a whole) but much less sense for RTP.

Don't take me wrong, I am not against that use cases at all (I like 
them), but as I see it, we should consider QUIC as an alternative to 
DataChannels from an API/use case point of view, and not as an 
alternative to RTP.

My worries are that we try to create an API that cover those use cases 
that works for both QUIC and RTP, which will create an awful experience 
for those willing to use RTP, or even worse, not even consider the RTP 
specifics at all (as you already have the "raw" functionality and you 
can implement that on your own on your javascript app), and RTP becoming 
a second class citizen on webrtc.

Best regards
Sergio



On 18/05/2018 7:17, Justin Uberti wrote:
> On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com 
> <mailto:yfablet@apple.com>> wrote:
>
>
>>     On May 16, 2018, at 10:06 AM, Harald Alvestrand
>>     <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote:
>>
>>     **
>>
>>     *This is a*copy of a document we've been working on in order to
>>     collect thoughts about the need for new APIs in WebRTC "TNG".**
>>     *
>>     **
>>     *It should bring out certain requirements that make some API
>>     proposals obvious (or not).*
>>     *Please comment!*
>>     **
>>     *(PDF version attached, so that the picture survives) *
>>
>>     Certain things are hard to do in the present WebRTC /
>>     MediaStreamTrack API.
>>     In particular, anything involving manipulation of raw data
>>     involves convoluted interfaces that impose burdens of format
>>     conversion and/or buffer copying on the user.
>>
>>     This document sketches the use cases that can be made possible if
>>     this access is made a lot easier and with lower overhead.
>>
>>     For reference, a model of the encoding / decoding pipeline in the
>>     communications use case:
>>
>>     When doing other types of processing, the pipeline stages may be
>>     connected elsewhere; for instance, when saving to file
>>     (MediaRecorder), the “Encode” step links to a “Storage” step, not
>>     “Transport”.
>>
>>     The “Decode” process will include alignment of media timing with
>>     real time (NetEq / jitter buffer); the process from raw data to
>>     display will happen “as fast as possible”.
>>
>>
>>         Raw Image Use Cases
>>
>>     This set of use cases involves the manipulation of video after it
>>     comes from the camera, but before it goes out for transmission,
>>     or vice versa.
>>     Examples of apps that consume raw data from a camera or other
>>     source, producing raw data that goes out for processing:
>>
>>      *
>>         Funny hats
>>      *
>>         Background removal
>>      *
>>         In-browser compositing (merge video streams)
>>
>>
>>     Needed APIs:
>>
>>      *
>>         Get raw frames from input device or path
>>      *
>>         Insert (processed) raw frames into output device or path
>>
>>     *
>
>
>     This makes huge sense to me.
>     It would make sense to mirror the capabilities of web audio here:
>     - The API should be able to process any source (camera, peer
>     connection, canvas probably, meaning handling of potentially
>     different frame formats)
>     - The API should be able to produce a source consumable by peer
>     connection, video elements.
>     - The API should allow to do as much processing (ideally the whole
>     processing) off the main thread.
>     - The API should allow leveraging existing APIs such as WASM, WebGL...
>
>>     *
>>
>>
>>         Non-Standard Encoders
>>
>>     This set of tools can be useful for either special types of
>>     operations (like detecting face movement and sending only those
>>     for backprojection on a model head rather than sending the
>>     picture of the face) or for testing out experimental codecs
>>     without involving browser changes (such as novel SVC or simulcast
>>     strategies).
>>     *
>
>     Given the potential complexity here and below, compelling use
>     cases seem really important to me.
>     I am not sure experimental codecs meet the bar and require a
>     standard API.
>     An experiment can always be done using a proprietary API,
>     available to browser extensions for instance.
>
>     As of special types of operation like detecting face movement,
>     there might be alternatives using the raw image API:
>     - Skip frames (say there is no head being detected)
>     - Generate structured data (image descriptor eg.) and send it over
>     data channel
>     - Transform an image before encoding/after decoding
>
>>     *
>>     Needed APIs, send side:
>>
>>      *
>>         Get raw frames from input device
>>      *
>>         Insert encoded frames on output transmission channel
>>      *
>>         Manipulate transmission setup so that normal encoder
>>         resources are not needed
>>
>>
>>     Needed APIs, receive side:
>>
>>      *
>>         Signalling access so that one knows what codec has been
>>         agreed for use
>>      *
>>         Get encoded frames from the input transmission channel
>>      *
>>         Insert raw (decoded) frames into output device or path
>>
>>
>>         Pre/post-transmission processing - Bring Your Own Encryption
>>
>>
>>     This is the inverse of the situation above: One has a video
>>     stream and wishes to encode it into a known codec, but process
>>     the data further in some way before sending it.The example in the
>>     title is one use case.The same APIs will also allow the usage of
>>     different transmission media (media over the data channel, or
>>     media over protobufs over QUIC streams, for instance).
>>
>>     *
>
>     I like this BYO encryption use case.
>     Note though that it does not specifically require to get access to
>     the encoded frames before doing the encryption.
>     We could envision an API to provide the encryption parameters
>     (keys e.g.) so that the browser does the encryption by itself.
>     Of course, it has pros (simple to implement, simple to use) and
>     cons (narrow scope).
>
>     I am not against adding support for scripting between encoding
>     frames and sending the encoded frames.
>     It seems like a powerful API.
>     We must weight though how much ground we gain versus how much
>     complexity we add, how much we resolve actual needs of the
>     community...
>
>
> It should be noted that mobile platforms currently provide this level 
> of access via MediaCodec (Android) and VideoToolbox (iOS). But I agree 
> that having compelling use cases is important.
>
>
>     Also to be noted that getting the encoded frames, processing them
>     and sending them to the network is currently done off the main thread.
>     One general concern is that the more we add JavaScript at various
>     points of the pipeline, the more we might decrease the
>     efficiency/stability/interoperability of the realtime pipeline.
>
>
> These encoded frames are typically going to go over the network, where 
> unexpected delays are a fact of life, and so the system is already 
> prepared to deal with them, e.g., via jitter buffers. (Or, the frames 
> will be written to a file, and this issue is entirely moot.)
>
> This is in contrast to the "raw image" cases, that will often be 
> operating in a performance-critical part of the pipeline.
>
>
>>     *
>>     Needed APIs, encode:
>>
>>      *
>>         Codec configuration - the stuff that usually happens at
>>         offer/answer time
>>      *
>>         Getting the encoded frames from the “output” channel
>>      *
>>         Inserting the processed encoded frames into the real “output”
>>         channel
>>      *
>>         Reaction to congestion information from the output channel
>>      *
>>         Feeding congestion signals into the encoder
>>
>>
>>     Needed APIs, decode:
>>
>>      *
>>         Codec configuration information
>>      *
>>         Getting the encoded frames from the input transport
>>      *
>>         Inserting the processed encoded frames into the input
>>         decoding process
>>
>>
>>     The same APIs are needed for other functions, such as:
>>
>>      *
>>         ML-NetEq: Jitter buffer control in other ways than the
>>         built-in browser
>>          o
>>             This also needs the ability to turn off the built-in
>>             jitter buffer, and therefore makes this API have the same
>>             timing requirements as dealing with raw data
>>      *
>>         ML-FEC: Application-defined strategies for recovering from
>>         lost packets.
>>      *
>>         Alternative transmission: Using something other than
>>         browser’s built-in realtime transport (currently SRTP) to
>>         move the media data
>>
>>
>>     *
>>
>>     -- 
>>     Surveillance is pervasive. Go Dark.
>>     <Raw Data Access - Explainer.pdf>
>

Received on Friday, 18 May 2018 08:24:05 UTC