Re: Use cases / requirements for raw data access functions

On Fri, May 18, 2018 at 1:28 AM Sergio Garcia Murillo <
sergio.garcia.murillo@gmail.com> wrote:

> IMHO we (me first) are not providing use cases, but
> features/functionalities we like to bring into the API, which is fine, but
> I think we are overlooking one fact: QUIC and RTP have one fundamental
> difference, udp fragmentation/packetization.
>
> While in QUIC the packetization is codec agnostic and performed deep into
> the stack, in RTP, the packetization is codec dependent and done before
> reaching the rtp stack.
>
> Why I am bringing that topic in? Because I feel that some use
> cases/features, for example raw access to encoded frames or the bring your
> own crypto makes a lot of sense for QUIC (where you just need to pass the
> raw binary data as a whole) but much less sense for RTP.
>
>
It makes just as much sense for RTP.  Having access to encoded frames
before RTP packetization is useful for e2ee for RTP as well.  And if the
RTP transport API is low-level enough, it would be just as easy to add
arbitrary metadata (another use case mentioned) as it is for QUIC.

Don't take me wrong, I am not against that use cases at all (I like them),
> but as I see it, we should consider QUIC as an alternative to DataChannels
> from an API/use case point of view, and not as an alternative to RTP.
>

QUIC is a transport, not a replacement for RTP.  But you can build a
replacement for RTP on top of QUIC (or an top of SCTP, for that matter).
Just as you could make an RTP data channel and build a replacement for RTP
on top of RTP.

We should separate transports from encoders (split the RtpSender in half)
to give more flexibility to apps.


My worries are that we try to create an API that cover those use cases that
> works for both QUIC and RTP, which will create an awful experience for
> those willing to use RTP, or even worse, not even consider the RTP
> specifics at all (as you already have the "raw" functionality and you can
> implement that on your own on your javascript app), and RTP becoming a
> second class citizen on webrtc.
>

How would splitting an RtpSender into an encoder and transport be an awful
experience?  You can do everything you can currently do, just more.

I don't think this is choice between RTP, QUIC, and SCTP.  We can make them
all work.    But the first step is decoupling encoder/decoders, transports,
and ICE.  Then apps/developers can assemble the parts they want the way
they want.


But, as you say, let's start with use cases, not solutions.

By the way, this is my use case: as a web/mobile developer, I want to send
media over QUIC for my own replacement for RTP.  QUIC isn't the replacement
for RTP, but my own protocol on top of QUIC (or SCTP, or any data channel)
is the replacement for RTP.    Luckily, this is "free" as soon as you add
QUIC data channels and split the RtpSender into encoder and transport.
That's all I need.




> Best regards
>
> Sergio
>
>
>
>
> On 18/05/2018 7:17, Justin Uberti wrote:
>
> On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com> wrote:
>
>
>> On May 16, 2018, at 10:06 AM, Harald Alvestrand <harald@alvestrand.no>
>> wrote:
>>
>> *This is a copy of a document we've been working on in order to collect
>> thoughts about the need for new APIs in WebRTC "TNG".*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> * It should bring out certain requirements that make some API proposals
>> obvious (or not). Please comment! (PDF version attached, so that the
>> picture survives) Certain things are hard to do in the present WebRTC /
>> MediaStreamTrack API. In particular, anything involving manipulation of raw
>> data involves convoluted interfaces that impose burdens of format
>> conversion and/or buffer copying on the user. This document sketches the
>> use cases that can be made possible if this access is made a lot easier and
>> with lower overhead. For reference, a model of the encoding / decoding
>> pipeline in the communications use case: When doing other types of
>> processing, the pipeline stages may be connected elsewhere; for instance,
>> when saving to file (MediaRecorder), the “Encode” step links to a “Storage”
>> step, not “Transport”. The “Decode” process will include alignment of media
>> timing with real time (NetEq / jitter buffer); the process from raw data to
>> display will happen “as fast as possible”. Raw Image Use Cases This set of
>> use cases involves the manipulation of video after it comes from the
>> camera, but before it goes out for transmission, or vice versa. Examples of
>> apps that consume raw data from a camera or other source, producing raw
>> data that goes out for processing: - Funny hats - Background removal -
>> In-browser compositing (merge video streams) Needed APIs: - Get raw frames
>> from input device or path - Insert (processed) raw frames into output
>> device or path *
>>
>>
>>
>> This makes huge sense to me.
>> It would make sense to mirror the capabilities of web audio here:
>> - The API should be able to process any source (camera, peer connection,
>> canvas probably, meaning handling of potentially different frame formats)
>> - The API should be able to produce a source consumable by peer
>> connection, video elements.
>> - The API should allow to do as much processing (ideally the whole
>> processing) off the main thread.
>> - The API should allow leveraging existing APIs such as WASM, WebGL...
>>
>>
>>
>>
>> * Non-Standard Encoders This set of tools can be useful for either
>> special types of operations (like detecting face movement and sending only
>> those for backprojection on a model head rather than sending the picture of
>> the face) or for testing out experimental codecs without involving browser
>> changes (such as novel SVC or simulcast strategies). *
>>
>>
>> Given the potential complexity here and below, compelling use cases seem
>> really important to me.
>> I am not sure experimental codecs meet the bar and require a standard API.
>> An experiment can always be done using a proprietary API, available to
>> browser extensions for instance.
>>
>> As of special types of operation like detecting face movement, there
>> might be alternatives using the raw image API:
>> - Skip frames (say there is no head being detected)
>> - Generate structured data (image descriptor eg.) and send it over data
>> channel
>> - Transform an image before encoding/after decoding
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> * Needed APIs, send side: - Get raw frames from input device - Insert
>> encoded frames on output transmission channel - Manipulate transmission
>> setup so that normal encoder resources are not needed Needed APIs, receive
>> side: - Signalling access so that one knows what codec has been agreed for
>> use - Get encoded frames from the input transmission channel - Insert raw
>> (decoded) frames into output device or path Pre/post-transmission
>> processing - Bring Your Own Encryption This is the inverse of the situation
>> above: One has a video stream and wishes to encode it into a known codec,
>> but process the data further in some way before sending it. The example in
>> the title is one use case. The same APIs will also allow the usage of
>> different transmission media (media over the data channel, or media over
>> protobufs over QUIC streams, for instance). *
>>
>>
>> I like this BYO encryption use case.
>> Note though that it does not specifically require to get access to the
>> encoded frames before doing the encryption.
>> We could envision an API to provide the encryption parameters (keys e.g.)
>> so that the browser does the encryption by itself.
>> Of course, it has pros (simple to implement, simple to use) and cons
>> (narrow scope).
>>
>> I am not against adding support for scripting between encoding frames and
>> sending the encoded frames.
>> It seems like a powerful API.
>> We must weight though how much ground we gain versus how much complexity
>> we add, how much we resolve actual needs of the community...
>>
>
> It should be noted that mobile platforms currently provide this level of
> access via MediaCodec (Android) and VideoToolbox (iOS). But I agree that
> having compelling use cases is important.
>
>>
>> Also to be noted that getting the encoded frames, processing them and
>> sending them to the network is currently done off the main thread.
>> One general concern is that the more we add JavaScript at various points
>> of the pipeline, the more we might decrease the
>> efficiency/stability/interoperability of the realtime pipeline.
>>
>
> These encoded frames are typically going to go over the network, where
> unexpected delays are a fact of life, and so the system is already prepared
> to deal with them, e.g., via jitter buffers. (Or, the frames will be
> written to a file, and this issue is entirely moot.)
>
> This is in contrast to the "raw image" cases, that will often be operating
> in a performance-critical part of the pipeline.
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> * Needed APIs, encode: - Codec configuration - the stuff that usually
>> happens at offer/answer time - Getting the encoded frames from the “output”
>> channel - Inserting the processed encoded frames into the real “output”
>> channel - Reaction to congestion information from the output channel -
>> Feeding congestion signals into the encoder Needed APIs, decode: - Codec
>> configuration information - Getting the encoded frames from the input
>> transport - Inserting the processed encoded frames into the input decoding
>> process The same APIs are needed for other functions, such as: - ML-NetEq:
>> Jitter buffer control in other ways than the built-in browser - This also
>> needs the ability to turn off the built-in jitter buffer, and therefore
>> makes this API have the same timing requirements as dealing with raw data -
>> ML-FEC: Application-defined strategies for recovering from lost packets. -
>> Alternative transmission: Using something other than browser’s built-in
>> realtime transport (currently SRTP) to move the media data *
>>
>> --
>> Surveillance is pervasive. Go Dark.
>>
>> <Raw Data Access - Explainer.pdf>
>>
>>
>>

Received on Friday, 18 May 2018 21:34:30 UTC