Re: Use cases / requirements for raw data access functions from Lorenzo Miniero on 2018-05-18 (public-webrtc@w3.org from May 2018)

From: Lorenzo Miniero <lorenzo@meetecho.com>
Date: Fri, 18 May 2018 10:39:13 +0200
To: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
Cc: public-webrtc@w3.org
Message-ID: <20180518103913.5177a91e@lminiero>
On Fri, 18 May 2018 10:24:17 +0200
Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com> wrote:

> IMHO we (me first) are not providing use cases, but 
> features/functionalities we like to bring into the API, which is
> fine, but I think we are overlooking one fact: QUIC and RTP have one 
> fundamental difference, udp fragmentation/packetization.
> 
> While in QUIC the packetization is codec agnostic and performed deep 
> into the stack, in RTP, the packetization is codec dependent and done 
> before reaching the rtp stack.
> 
> Why I am bringing that topic in? Because I feel that some use 
> cases/features, for example raw access to encoded frames or the bring 
> your own crypto makes a lot of sense for QUIC (where you just need to 
> pass the raw binary data as a whole) but much less sense for RTP.
> 
> Don't take me wrong, I am not against that use cases at all (I like 
> them), but as I see it, we should consider QUIC as an alternative to 
> DataChannels from an API/use case point of view, and not as an 
> alternative to RTP.
> 
> My worries are that we try to create an API that cover those use
> cases that works for both QUIC and RTP, which will create an awful
> experience for those willing to use RTP, or even worse, not even
> consider the RTP specifics at all (as you already have the "raw"
> functionality and you can implement that on your own on your
> javascript app), and RTP becoming a second class citizen on webrtc.
> 


+1!

Lorenzo


> Best regards
> Sergio
> 
> 
> 
> On 18/05/2018 7:17, Justin Uberti wrote:
> > On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com 
> > <mailto:yfablet@apple.com>> wrote:
> >
> >  
> >>     On May 16, 2018, at 10:06 AM, Harald Alvestrand
> >>     <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote:
> >>
> >>     **
> >>
> >>     *This is a*copy of a document we've been working on in order to
> >>     collect thoughts about the need for new APIs in WebRTC "TNG".**
> >>     *
> >>     **
> >>     *It should bring out certain requirements that make some API
> >>     proposals obvious (or not).*
> >>     *Please comment!*
> >>     **
> >>     *(PDF version attached, so that the picture survives) *
> >>
> >>     Certain things are hard to do in the present WebRTC /
> >>     MediaStreamTrack API.
> >>     In particular, anything involving manipulation of raw data
> >>     involves convoluted interfaces that impose burdens of format
> >>     conversion and/or buffer copying on the user.
> >>
> >>     This document sketches the use cases that can be made possible
> >> if this access is made a lot easier and with lower overhead.
> >>
> >>     For reference, a model of the encoding / decoding pipeline in
> >> the communications use case:
> >>
> >>     When doing other types of processing, the pipeline stages may
> >> be connected elsewhere; for instance, when saving to file
> >>     (MediaRecorder), the “Encode” step links to a “Storage” step,
> >> not “Transport”.
> >>
> >>     The “Decode” process will include alignment of media timing
> >> with real time (NetEq / jitter buffer); the process from raw data
> >> to display will happen “as fast as possible”.
> >>
> >>
> >>         Raw Image Use Cases
> >>
> >>     This set of use cases involves the manipulation of video after
> >> it comes from the camera, but before it goes out for transmission,
> >>     or vice versa.
> >>     Examples of apps that consume raw data from a camera or other
> >>     source, producing raw data that goes out for processing:
> >>
> >>      *
> >>         Funny hats
> >>      *
> >>         Background removal
> >>      *
> >>         In-browser compositing (merge video streams)
> >>
> >>
> >>     Needed APIs:
> >>
> >>      *
> >>         Get raw frames from input device or path
> >>      *
> >>         Insert (processed) raw frames into output device or path
> >>
> >>     *  
> >
> >
> >     This makes huge sense to me.
> >     It would make sense to mirror the capabilities of web audio
> > here:
> >     - The API should be able to process any source (camera, peer
> >     connection, canvas probably, meaning handling of potentially
> >     different frame formats)
> >     - The API should be able to produce a source consumable by peer
> >     connection, video elements.
> >     - The API should allow to do as much processing (ideally the
> > whole processing) off the main thread.
> >     - The API should allow leveraging existing APIs such as WASM,
> > WebGL... 
> >>     *
> >>
> >>
> >>         Non-Standard Encoders
> >>
> >>     This set of tools can be useful for either special types of
> >>     operations (like detecting face movement and sending only those
> >>     for backprojection on a model head rather than sending the
> >>     picture of the face) or for testing out experimental codecs
> >>     without involving browser changes (such as novel SVC or
> >> simulcast strategies).
> >>     *  
> >
> >     Given the potential complexity here and below, compelling use
> >     cases seem really important to me.
> >     I am not sure experimental codecs meet the bar and require a
> >     standard API.
> >     An experiment can always be done using a proprietary API,
> >     available to browser extensions for instance.
> >
> >     As of special types of operation like detecting face movement,
> >     there might be alternatives using the raw image API:
> >     - Skip frames (say there is no head being detected)
> >     - Generate structured data (image descriptor eg.) and send it
> > over data channel
> >     - Transform an image before encoding/after decoding
> >  
> >>     *
> >>     Needed APIs, send side:
> >>
> >>      *
> >>         Get raw frames from input device
> >>      *
> >>         Insert encoded frames on output transmission channel
> >>      *
> >>         Manipulate transmission setup so that normal encoder
> >>         resources are not needed
> >>
> >>
> >>     Needed APIs, receive side:
> >>
> >>      *
> >>         Signalling access so that one knows what codec has been
> >>         agreed for use
> >>      *
> >>         Get encoded frames from the input transmission channel
> >>      *
> >>         Insert raw (decoded) frames into output device or path
> >>
> >>
> >>         Pre/post-transmission processing - Bring Your Own
> >> Encryption
> >>
> >>
> >>     This is the inverse of the situation above: One has a video
> >>     stream and wishes to encode it into a known codec, but process
> >>     the data further in some way before sending it.The example in
> >> the title is one use case.The same APIs will also allow the usage
> >> of different transmission media (media over the data channel, or
> >>     media over protobufs over QUIC streams, for instance).
> >>
> >>     *  
> >
> >     I like this BYO encryption use case.
> >     Note though that it does not specifically require to get access
> > to the encoded frames before doing the encryption.
> >     We could envision an API to provide the encryption parameters
> >     (keys e.g.) so that the browser does the encryption by itself.
> >     Of course, it has pros (simple to implement, simple to use) and
> >     cons (narrow scope).
> >
> >     I am not against adding support for scripting between encoding
> >     frames and sending the encoded frames.
> >     It seems like a powerful API.
> >     We must weight though how much ground we gain versus how much
> >     complexity we add, how much we resolve actual needs of the
> >     community...
> >
> >
> > It should be noted that mobile platforms currently provide this
> > level of access via MediaCodec (Android) and VideoToolbox (iOS).
> > But I agree that having compelling use cases is important.
> >
> >
> >     Also to be noted that getting the encoded frames, processing
> > them and sending them to the network is currently done off the main
> > thread. One general concern is that the more we add JavaScript at
> > various points of the pipeline, the more we might decrease the
> >     efficiency/stability/interoperability of the realtime pipeline.
> >
> >
> > These encoded frames are typically going to go over the network,
> > where unexpected delays are a fact of life, and so the system is
> > already prepared to deal with them, e.g., via jitter buffers. (Or,
> > the frames will be written to a file, and this issue is entirely
> > moot.)
> >
> > This is in contrast to the "raw image" cases, that will often be 
> > operating in a performance-critical part of the pipeline.
> >
> >  
> >>     *
> >>     Needed APIs, encode:
> >>
> >>      *
> >>         Codec configuration - the stuff that usually happens at
> >>         offer/answer time
> >>      *
> >>         Getting the encoded frames from the “output” channel
> >>      *
> >>         Inserting the processed encoded frames into the real
> >> “output” channel
> >>      *
> >>         Reaction to congestion information from the output channel
> >>      *
> >>         Feeding congestion signals into the encoder
> >>
> >>
> >>     Needed APIs, decode:
> >>
> >>      *
> >>         Codec configuration information
> >>      *
> >>         Getting the encoded frames from the input transport
> >>      *
> >>         Inserting the processed encoded frames into the input
> >>         decoding process
> >>
> >>
> >>     The same APIs are needed for other functions, such as:
> >>
> >>      *
> >>         ML-NetEq: Jitter buffer control in other ways than the
> >>         built-in browser
> >>          o
> >>             This also needs the ability to turn off the built-in
> >>             jitter buffer, and therefore makes this API have the
> >> same timing requirements as dealing with raw data
> >>      *
> >>         ML-FEC: Application-defined strategies for recovering from
> >>         lost packets.
> >>      *
> >>         Alternative transmission: Using something other than
> >>         browser’s built-in realtime transport (currently SRTP) to
> >>         move the media data
> >>
> >>
> >>     *
> >>
> >>     -- 
> >>     Surveillance is pervasive. Go Dark.
> >>     <Raw Data Access - Explainer.pdf>  
> >  
> 



-- 
I'm getting older but, unlike whisky, I'm not getting any better
https://twitter.com/elminiero
Received on Friday, 18 May 2018 08:39:43 UTC