- From: Lorenzo Miniero <lorenzo@meetecho.com>
- Date: Fri, 18 May 2018 23:54:05 +0200
- To: Peter Thatcher <pthatcher@google.com>,Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
- CC: public-webrtc@w3.org
Il 18 maggio 2018 23:33:49 CEST, Peter Thatcher <pthatcher@google.com> ha scritto: >On Fri, May 18, 2018 at 1:28 AM Sergio Garcia Murillo < >sergio.garcia.murillo@gmail.com> wrote: > >> IMHO we (me first) are not providing use cases, but >> features/functionalities we like to bring into the API, which is >fine, but >> I think we are overlooking one fact: QUIC and RTP have one >fundamental >> difference, udp fragmentation/packetization. >> >> While in QUIC the packetization is codec agnostic and performed deep >into >> the stack, in RTP, the packetization is codec dependent and done >before >> reaching the rtp stack. >> >> Why I am bringing that topic in? Because I feel that some use >> cases/features, for example raw access to encoded frames or the bring >your >> own crypto makes a lot of sense for QUIC (where you just need to pass >the >> raw binary data as a whole) but much less sense for RTP. >> >> >It makes just as much sense for RTP. Having access to encoded frames >before RTP packetization is useful for e2ee for RTP as well. And if >the >RTP transport API is low-level enough, it would be just as easy to add >arbitrary metadata (another use case mentioned) as it is for QUIC. > >Don't take me wrong, I am not against that use cases at all (I like >them), >> but as I see it, we should consider QUIC as an alternative to >DataChannels >> from an API/use case point of view, and not as an alternative to RTP. >> > >QUIC is a transport, not a replacement for RTP. But you can build a >replacement for RTP on top of QUIC (or an top of SCTP, for that >matter). >Just as you could make an RTP data channel and build a replacement for >RTP >on top of RTP. > >We should separate transports from encoders (split the RtpSender in >half) >to give more flexibility to apps. > > >My worries are that we try to create an API that cover those use cases >that >> works for both QUIC and RTP, which will create an awful experience >for >> those willing to use RTP, or even worse, not even consider the RTP >> specifics at all (as you already have the "raw" functionality and you >can >> implement that on your own on your javascript app), and RTP becoming >a >> second class citizen on webrtc. >> > >How would splitting an RtpSender into an encoder and transport be an >awful >experience? You can do everything you can currently do, just more. > >I don't think this is choice between RTP, QUIC, and SCTP. We can make >them >all work. But the first step is decoupling encoder/decoders, >transports, >and ICE. Then apps/developers can assemble the parts they want the way >they want. > > >But, as you say, let's start with use cases, not solutions. > >By the way, this is my use case: as a web/mobile developer, I want to >send >media over QUIC for my own replacement for RTP. QUIC isn't the >replacement >for RTP, but my own protocol on top of QUIC (or SCTP, or any data >channel) >is the replacement for RTP. Luckily, this is "free" as soon as you >add >QUIC data channels and split the RtpSender into encoder and transport. >That's all I need. > Farewell interoperability... L. > > > >> Best regards >> >> Sergio >> >> >> >> >> On 18/05/2018 7:17, Justin Uberti wrote: >> >> On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com> >wrote: >> >> >>> On May 16, 2018, at 10:06 AM, Harald Alvestrand ><harald@alvestrand.no> >>> wrote: >>> >>> *This is a copy of a document we've been working on in order to >collect >>> thoughts about the need for new APIs in WebRTC "TNG".* >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> * It should bring out certain requirements that make some API >proposals >>> obvious (or not). Please comment! (PDF version attached, so that the >>> picture survives) Certain things are hard to do in the present >WebRTC / >>> MediaStreamTrack API. In particular, anything involving manipulation >of raw >>> data involves convoluted interfaces that impose burdens of format >>> conversion and/or buffer copying on the user. This document sketches >the >>> use cases that can be made possible if this access is made a lot >easier and >>> with lower overhead. For reference, a model of the encoding / >decoding >>> pipeline in the communications use case: When doing other types of >>> processing, the pipeline stages may be connected elsewhere; for >instance, >>> when saving to file (MediaRecorder), the “Encode” step links to a >“Storage” >>> step, not “Transport”. The “Decode” process will include alignment >of media >>> timing with real time (NetEq / jitter buffer); the process from raw >data to >>> display will happen “as fast as possible”. Raw Image Use Cases This >set of >>> use cases involves the manipulation of video after it comes from the >>> camera, but before it goes out for transmission, or vice versa. >Examples of >>> apps that consume raw data from a camera or other source, producing >raw >>> data that goes out for processing: - Funny hats - Background removal >- >>> In-browser compositing (merge video streams) Needed APIs: - Get raw >frames >>> from input device or path - Insert (processed) raw frames into >output >>> device or path * >>> >>> >>> >>> This makes huge sense to me. >>> It would make sense to mirror the capabilities of web audio here: >>> - The API should be able to process any source (camera, peer >connection, >>> canvas probably, meaning handling of potentially different frame >formats) >>> - The API should be able to produce a source consumable by peer >>> connection, video elements. >>> - The API should allow to do as much processing (ideally the whole >>> processing) off the main thread. >>> - The API should allow leveraging existing APIs such as WASM, >WebGL... >>> >>> >>> >>> >>> * Non-Standard Encoders This set of tools can be useful for either >>> special types of operations (like detecting face movement and >sending only >>> those for backprojection on a model head rather than sending the >picture of >>> the face) or for testing out experimental codecs without involving >browser >>> changes (such as novel SVC or simulcast strategies). * >>> >>> >>> Given the potential complexity here and below, compelling use cases >seem >>> really important to me. >>> I am not sure experimental codecs meet the bar and require a >standard API. >>> An experiment can always be done using a proprietary API, available >to >>> browser extensions for instance. >>> >>> As of special types of operation like detecting face movement, there >>> might be alternatives using the raw image API: >>> - Skip frames (say there is no head being detected) >>> - Generate structured data (image descriptor eg.) and send it over >data >>> channel >>> - Transform an image before encoding/after decoding >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> * Needed APIs, send side: - Get raw frames from input device - >Insert >>> encoded frames on output transmission channel - Manipulate >transmission >>> setup so that normal encoder resources are not needed Needed APIs, >receive >>> side: - Signalling access so that one knows what codec has been >agreed for >>> use - Get encoded frames from the input transmission channel - >Insert raw >>> (decoded) frames into output device or path Pre/post-transmission >>> processing - Bring Your Own Encryption This is the inverse of the >situation >>> above: One has a video stream and wishes to encode it into a known >codec, >>> but process the data further in some way before sending it. The >example in >>> the title is one use case. The same APIs will also allow the usage >of >>> different transmission media (media over the data channel, or media >over >>> protobufs over QUIC streams, for instance). * >>> >>> >>> I like this BYO encryption use case. >>> Note though that it does not specifically require to get access to >the >>> encoded frames before doing the encryption. >>> We could envision an API to provide the encryption parameters (keys >e.g.) >>> so that the browser does the encryption by itself. >>> Of course, it has pros (simple to implement, simple to use) and cons >>> (narrow scope). >>> >>> I am not against adding support for scripting between encoding >frames and >>> sending the encoded frames. >>> It seems like a powerful API. >>> We must weight though how much ground we gain versus how much >complexity >>> we add, how much we resolve actual needs of the community... >>> >> >> It should be noted that mobile platforms currently provide this level >of >> access via MediaCodec (Android) and VideoToolbox (iOS). But I agree >that >> having compelling use cases is important. >> >>> >>> Also to be noted that getting the encoded frames, processing them >and >>> sending them to the network is currently done off the main thread. >>> One general concern is that the more we add JavaScript at various >points >>> of the pipeline, the more we might decrease the >>> efficiency/stability/interoperability of the realtime pipeline. >>> >> >> These encoded frames are typically going to go over the network, >where >> unexpected delays are a fact of life, and so the system is already >prepared >> to deal with them, e.g., via jitter buffers. (Or, the frames will be >> written to a file, and this issue is entirely moot.) >> >> This is in contrast to the "raw image" cases, that will often be >operating >> in a performance-critical part of the pipeline. >> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> * Needed APIs, encode: - Codec configuration - the stuff that >usually >>> happens at offer/answer time - Getting the encoded frames from the >“output” >>> channel - Inserting the processed encoded frames into the real >“output” >>> channel - Reaction to congestion information from the output channel >- >>> Feeding congestion signals into the encoder Needed APIs, decode: - >Codec >>> configuration information - Getting the encoded frames from the >input >>> transport - Inserting the processed encoded frames into the input >decoding >>> process The same APIs are needed for other functions, such as: - >ML-NetEq: >>> Jitter buffer control in other ways than the built-in browser - This >also >>> needs the ability to turn off the built-in jitter buffer, and >therefore >>> makes this API have the same timing requirements as dealing with raw >data - >>> ML-FEC: Application-defined strategies for recovering from lost >packets. - >>> Alternative transmission: Using something other than browser’s >built-in >>> realtime transport (currently SRTP) to move the media data * >>> >>> -- >>> Surveillance is pervasive. Go Dark. >>> >>> <Raw Data Access - Explainer.pdf> >>> >>> >>> -- Inviato dal mio dispositivo Android con K-9 Mail. Perdonate la brevità.
Received on Friday, 18 May 2018 21:55:15 UTC