- From: Justin Uberti <juberti@google.com>
- Date: Thu, 17 May 2018 22:17:28 -0700
- To: yfablet@apple.com
- Cc: Harald Alvestrand <harald@alvestrand.no>, public-webrtc@w3.org
- Message-ID: <CAOJ7v-1bhrKHNyj0bz3m60a_3LQXAYRkZKegqXDu+Yw4zQJtBw@mail.gmail.com>
On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com> wrote: > Thanks Harald for writing all of this, > Some early feedback below. > Y > > On May 16, 2018, at 10:06 AM, Harald Alvestrand <harald@alvestrand.no> > wrote: > > *This is a copy of a document we've been working on in order to collect > thoughts about the need for new APIs in WebRTC "TNG".* > > > > > > > > > > > > * It should bring out certain requirements that make some API proposals > obvious (or not).Please comment! (PDF version attached, so that the picture > survives) Certain things are hard to do in the present WebRTC / > MediaStreamTrack API.In particular, anything involving manipulation of raw > data involves convoluted interfaces that impose burdens of format > conversion and/or buffer copying on the user. This document sketches the > use cases that can be made possible if this access is made a lot easier and > with lower overhead. For reference, a model of the encoding / decoding > pipeline in the communications use case: When doing other types of > processing, the pipeline stages may be connected elsewhere; for instance, > when saving to file (MediaRecorder), the “Encode” step links to a “Storage” > step, not “Transport”. The “Decode” process will include alignment of media > timing with real time (NetEq / jitter buffer); the process from raw data to > display will happen “as fast as possible”. Raw Image Use CasesThis set of > use cases involves the manipulation of video after it comes from the > camera, but before it goes out for transmission, or vice versa.Examples of > apps that consume raw data from a camera or other source, producing raw > data that goes out for processing: - Funny hats - Background removal - > In-browser compositing (merge video streams) Needed APIs: - Get raw frames > from input device or path - Insert (processed) raw frames into output > device or path* > > > > This makes huge sense to me. > It would make sense to mirror the capabilities of web audio here: > - The API should be able to process any source (camera, peer connection, > canvas probably, meaning handling of potentially different frame formats) > - The API should be able to produce a source consumable by peer > connection, video elements. > - The API should allow to do as much processing (ideally the whole > processing) off the main thread. > - The API should allow leveraging existing APIs such as WASM, WebGL... > > > > > * Non-Standard EncodersThis set of tools can be useful for either special > types of operations (like detecting face movement and sending only those > for backprojection on a model head rather than sending the picture of the > face) or for testing out experimental codecs without involving browser > changes (such as novel SVC or simulcast strategies).* > > > Given the potential complexity here and below, compelling use cases seem > really important to me. > I am not sure experimental codecs meet the bar and require a standard API. > An experiment can always be done using a proprietary API, available to > browser extensions for instance. > > As of special types of operation like detecting face movement, there might > be alternatives using the raw image API: > - Skip frames (say there is no head being detected) > - Generate structured data (image descriptor eg.) and send it over data > channel > - Transform an image before encoding/after decoding > > > > > > > > > > > * Needed APIs, send side: - Get raw frames from input device - Insert > encoded frames on output transmission channel - Manipulate transmission > setup so that normal encoder resources are not needed Needed APIs, receive > side: - Signalling access so that one knows what codec has been agreed for > use - Get encoded frames from the input transmission channel - Insert raw > (decoded) frames into output device or path Pre/post-transmission > processing - Bring Your Own Encryption This is the inverse of the situation > above: One has a video stream and wishes to encode it into a known codec, > but process the data further in some way before sending it. The example in > the title is one use case. The same APIs will also allow the usage of > different transmission media (media over the data channel, or media over > protobufs over QUIC streams, for instance). * > > > I like this BYO encryption use case. > Note though that it does not specifically require to get access to the > encoded frames before doing the encryption. > We could envision an API to provide the encryption parameters (keys e.g.) > so that the browser does the encryption by itself. > Of course, it has pros (simple to implement, simple to use) and cons > (narrow scope). > > I am not against adding support for scripting between encoding frames and > sending the encoded frames. > It seems like a powerful API. > We must weight though how much ground we gain versus how much complexity > we add, how much we resolve actual needs of the community... > It should be noted that mobile platforms currently provide this level of access via MediaCodec (Android) and VideoToolbox (iOS). But I agree that having compelling use cases is important. > > Also to be noted that getting the encoded frames, processing them and > sending them to the network is currently done off the main thread. > One general concern is that the more we add JavaScript at various points > of the pipeline, the more we might decrease the > efficiency/stability/interoperability of the realtime pipeline. > These encoded frames are typically going to go over the network, where unexpected delays are a fact of life, and so the system is already prepared to deal with them, e.g., via jitter buffers. (Or, the frames will be written to a file, and this issue is entirely moot.) This is in contrast to the "raw image" cases, that will often be operating in a performance-critical part of the pipeline. > > > > > > > > > > > *Needed APIs, encode: - Codec configuration - the stuff that usually > happens at offer/answer time - Getting the encoded frames from the “output” > channel - Inserting the processed encoded frames into the real “output” > channel - Reaction to congestion information from the output channel - > Feeding congestion signals into the encoder Needed APIs, decode: - Codec > configuration information - Getting the encoded frames from the input > transport - Inserting the processed encoded frames into the input decoding > process The same APIs are needed for other functions, such as: - ML-NetEq: > Jitter buffer control in other ways than the built-in browser - This also > needs the ability to turn off the built-in jitter buffer, and therefore > makes this API have the same timing requirements as dealing with raw data - > ML-FEC: Application-defined strategies for recovering from lost packets. - > Alternative transmission: Using something other than browser’s built-in > realtime transport (currently SRTP) to move the media data * > > -- > Surveillance is pervasive. Go Dark. > > <Raw Data Access - Explainer.pdf> > > >
Received on Friday, 18 May 2018 05:18:11 UTC