- From: Lorenzo Miniero <lorenzo@meetecho.com>
- Date: Fri, 18 May 2018 10:39:13 +0200
- To: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
- Cc: public-webrtc@w3.org
On Fri, 18 May 2018 10:24:17 +0200 Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com> wrote: > IMHO we (me first) are not providing use cases, but > features/functionalities we like to bring into the API, which is > fine, but I think we are overlooking one fact: QUIC and RTP have one > fundamental difference, udp fragmentation/packetization. > > While in QUIC the packetization is codec agnostic and performed deep > into the stack, in RTP, the packetization is codec dependent and done > before reaching the rtp stack. > > Why I am bringing that topic in? Because I feel that some use > cases/features, for example raw access to encoded frames or the bring > your own crypto makes a lot of sense for QUIC (where you just need to > pass the raw binary data as a whole) but much less sense for RTP. > > Don't take me wrong, I am not against that use cases at all (I like > them), but as I see it, we should consider QUIC as an alternative to > DataChannels from an API/use case point of view, and not as an > alternative to RTP. > > My worries are that we try to create an API that cover those use > cases that works for both QUIC and RTP, which will create an awful > experience for those willing to use RTP, or even worse, not even > consider the RTP specifics at all (as you already have the "raw" > functionality and you can implement that on your own on your > javascript app), and RTP becoming a second class citizen on webrtc. > +1! Lorenzo > Best regards > Sergio > > > > On 18/05/2018 7:17, Justin Uberti wrote: > > On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com > > <mailto:yfablet@apple.com>> wrote: > > > > > >> On May 16, 2018, at 10:06 AM, Harald Alvestrand > >> <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote: > >> > >> ** > >> > >> *This is a*copy of a document we've been working on in order to > >> collect thoughts about the need for new APIs in WebRTC "TNG".** > >> * > >> ** > >> *It should bring out certain requirements that make some API > >> proposals obvious (or not).* > >> *Please comment!* > >> ** > >> *(PDF version attached, so that the picture survives) * > >> > >> Certain things are hard to do in the present WebRTC / > >> MediaStreamTrack API. > >> In particular, anything involving manipulation of raw data > >> involves convoluted interfaces that impose burdens of format > >> conversion and/or buffer copying on the user. > >> > >> This document sketches the use cases that can be made possible > >> if this access is made a lot easier and with lower overhead. > >> > >> For reference, a model of the encoding / decoding pipeline in > >> the communications use case: > >> > >> When doing other types of processing, the pipeline stages may > >> be connected elsewhere; for instance, when saving to file > >> (MediaRecorder), the “Encode” step links to a “Storage” step, > >> not “Transport”. > >> > >> The “Decode” process will include alignment of media timing > >> with real time (NetEq / jitter buffer); the process from raw data > >> to display will happen “as fast as possible”. > >> > >> > >> Raw Image Use Cases > >> > >> This set of use cases involves the manipulation of video after > >> it comes from the camera, but before it goes out for transmission, > >> or vice versa. > >> Examples of apps that consume raw data from a camera or other > >> source, producing raw data that goes out for processing: > >> > >> * > >> Funny hats > >> * > >> Background removal > >> * > >> In-browser compositing (merge video streams) > >> > >> > >> Needed APIs: > >> > >> * > >> Get raw frames from input device or path > >> * > >> Insert (processed) raw frames into output device or path > >> > >> * > > > > > > This makes huge sense to me. > > It would make sense to mirror the capabilities of web audio > > here: > > - The API should be able to process any source (camera, peer > > connection, canvas probably, meaning handling of potentially > > different frame formats) > > - The API should be able to produce a source consumable by peer > > connection, video elements. > > - The API should allow to do as much processing (ideally the > > whole processing) off the main thread. > > - The API should allow leveraging existing APIs such as WASM, > > WebGL... > >> * > >> > >> > >> Non-Standard Encoders > >> > >> This set of tools can be useful for either special types of > >> operations (like detecting face movement and sending only those > >> for backprojection on a model head rather than sending the > >> picture of the face) or for testing out experimental codecs > >> without involving browser changes (such as novel SVC or > >> simulcast strategies). > >> * > > > > Given the potential complexity here and below, compelling use > > cases seem really important to me. > > I am not sure experimental codecs meet the bar and require a > > standard API. > > An experiment can always be done using a proprietary API, > > available to browser extensions for instance. > > > > As of special types of operation like detecting face movement, > > there might be alternatives using the raw image API: > > - Skip frames (say there is no head being detected) > > - Generate structured data (image descriptor eg.) and send it > > over data channel > > - Transform an image before encoding/after decoding > > > >> * > >> Needed APIs, send side: > >> > >> * > >> Get raw frames from input device > >> * > >> Insert encoded frames on output transmission channel > >> * > >> Manipulate transmission setup so that normal encoder > >> resources are not needed > >> > >> > >> Needed APIs, receive side: > >> > >> * > >> Signalling access so that one knows what codec has been > >> agreed for use > >> * > >> Get encoded frames from the input transmission channel > >> * > >> Insert raw (decoded) frames into output device or path > >> > >> > >> Pre/post-transmission processing - Bring Your Own > >> Encryption > >> > >> > >> This is the inverse of the situation above: One has a video > >> stream and wishes to encode it into a known codec, but process > >> the data further in some way before sending it.The example in > >> the title is one use case.The same APIs will also allow the usage > >> of different transmission media (media over the data channel, or > >> media over protobufs over QUIC streams, for instance). > >> > >> * > > > > I like this BYO encryption use case. > > Note though that it does not specifically require to get access > > to the encoded frames before doing the encryption. > > We could envision an API to provide the encryption parameters > > (keys e.g.) so that the browser does the encryption by itself. > > Of course, it has pros (simple to implement, simple to use) and > > cons (narrow scope). > > > > I am not against adding support for scripting between encoding > > frames and sending the encoded frames. > > It seems like a powerful API. > > We must weight though how much ground we gain versus how much > > complexity we add, how much we resolve actual needs of the > > community... > > > > > > It should be noted that mobile platforms currently provide this > > level of access via MediaCodec (Android) and VideoToolbox (iOS). > > But I agree that having compelling use cases is important. > > > > > > Also to be noted that getting the encoded frames, processing > > them and sending them to the network is currently done off the main > > thread. One general concern is that the more we add JavaScript at > > various points of the pipeline, the more we might decrease the > > efficiency/stability/interoperability of the realtime pipeline. > > > > > > These encoded frames are typically going to go over the network, > > where unexpected delays are a fact of life, and so the system is > > already prepared to deal with them, e.g., via jitter buffers. (Or, > > the frames will be written to a file, and this issue is entirely > > moot.) > > > > This is in contrast to the "raw image" cases, that will often be > > operating in a performance-critical part of the pipeline. > > > > > >> * > >> Needed APIs, encode: > >> > >> * > >> Codec configuration - the stuff that usually happens at > >> offer/answer time > >> * > >> Getting the encoded frames from the “output” channel > >> * > >> Inserting the processed encoded frames into the real > >> “output” channel > >> * > >> Reaction to congestion information from the output channel > >> * > >> Feeding congestion signals into the encoder > >> > >> > >> Needed APIs, decode: > >> > >> * > >> Codec configuration information > >> * > >> Getting the encoded frames from the input transport > >> * > >> Inserting the processed encoded frames into the input > >> decoding process > >> > >> > >> The same APIs are needed for other functions, such as: > >> > >> * > >> ML-NetEq: Jitter buffer control in other ways than the > >> built-in browser > >> o > >> This also needs the ability to turn off the built-in > >> jitter buffer, and therefore makes this API have the > >> same timing requirements as dealing with raw data > >> * > >> ML-FEC: Application-defined strategies for recovering from > >> lost packets. > >> * > >> Alternative transmission: Using something other than > >> browser’s built-in realtime transport (currently SRTP) to > >> move the media data > >> > >> > >> * > >> > >> -- > >> Surveillance is pervasive. Go Dark. > >> <Raw Data Access - Explainer.pdf> > > > -- I'm getting older but, unlike whisky, I'm not getting any better https://twitter.com/elminiero
Received on Friday, 18 May 2018 08:39:43 UTC