W3C home > Mailing lists > Public > public-webrtc@w3.org > May 2018

Re: Use cases / requirements for raw data access functions

From: Lorenzo Miniero <lorenzo@meetecho.com>
Date: Fri, 18 May 2018 23:54:05 +0200
To: Peter Thatcher <pthatcher@google.com>,Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
CC: public-webrtc@w3.org
Message-ID: <DB03B601-4ED5-4E30-A22B-E63689B55057@meetecho.com>

Il 18 maggio 2018 23:33:49 CEST, Peter Thatcher <pthatcher@google.com> ha scritto:
>On Fri, May 18, 2018 at 1:28 AM Sergio Garcia Murillo <
>sergio.garcia.murillo@gmail.com> wrote:
>> IMHO we (me first) are not providing use cases, but
>> features/functionalities we like to bring into the API, which is
>fine, but
>> I think we are overlooking one fact: QUIC and RTP have one
>> difference, udp fragmentation/packetization.
>> While in QUIC the packetization is codec agnostic and performed deep
>> the stack, in RTP, the packetization is codec dependent and done
>> reaching the rtp stack.
>> Why I am bringing that topic in? Because I feel that some use
>> cases/features, for example raw access to encoded frames or the bring
>> own crypto makes a lot of sense for QUIC (where you just need to pass
>> raw binary data as a whole) but much less sense for RTP.
>It makes just as much sense for RTP.  Having access to encoded frames
>before RTP packetization is useful for e2ee for RTP as well.  And if
>RTP transport API is low-level enough, it would be just as easy to add
>arbitrary metadata (another use case mentioned) as it is for QUIC.
>Don't take me wrong, I am not against that use cases at all (I like
>> but as I see it, we should consider QUIC as an alternative to
>> from an API/use case point of view, and not as an alternative to RTP.
>QUIC is a transport, not a replacement for RTP.  But you can build a
>replacement for RTP on top of QUIC (or an top of SCTP, for that
>Just as you could make an RTP data channel and build a replacement for
>on top of RTP.
>We should separate transports from encoders (split the RtpSender in
>to give more flexibility to apps.
>My worries are that we try to create an API that cover those use cases
>> works for both QUIC and RTP, which will create an awful experience
>> those willing to use RTP, or even worse, not even consider the RTP
>> specifics at all (as you already have the "raw" functionality and you
>> implement that on your own on your javascript app), and RTP becoming
>> second class citizen on webrtc.
>How would splitting an RtpSender into an encoder and transport be an
>experience?  You can do everything you can currently do, just more.
>I don't think this is choice between RTP, QUIC, and SCTP.  We can make
>all work.    But the first step is decoupling encoder/decoders,
>and ICE.  Then apps/developers can assemble the parts they want the way
>they want.
>But, as you say, let's start with use cases, not solutions.
>By the way, this is my use case: as a web/mobile developer, I want to
>media over QUIC for my own replacement for RTP.  QUIC isn't the
>for RTP, but my own protocol on top of QUIC (or SCTP, or any data
>is the replacement for RTP.    Luckily, this is "free" as soon as you
>QUIC data channels and split the RtpSender into encoder and transport.
>That's all I need.

Farewell interoperability...


>> Best regards
>> Sergio
>> On 18/05/2018 7:17, Justin Uberti wrote:
>> On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com>
>>> On May 16, 2018, at 10:06 AM, Harald Alvestrand
>>> wrote:
>>> *This is a copy of a document we've been working on in order to
>>> thoughts about the need for new APIs in WebRTC "TNG".*
>>> * It should bring out certain requirements that make some API
>>> obvious (or not). Please comment! (PDF version attached, so that the
>>> picture survives) Certain things are hard to do in the present
>WebRTC /
>>> MediaStreamTrack API. In particular, anything involving manipulation
>of raw
>>> data involves convoluted interfaces that impose burdens of format
>>> conversion and/or buffer copying on the user. This document sketches
>>> use cases that can be made possible if this access is made a lot
>easier and
>>> with lower overhead. For reference, a model of the encoding /
>>> pipeline in the communications use case: When doing other types of
>>> processing, the pipeline stages may be connected elsewhere; for
>>> when saving to file (MediaRecorder), the “Encode” step links to a
>>> step, not “Transport”. The “Decode” process will include alignment
>of media
>>> timing with real time (NetEq / jitter buffer); the process from raw
>data to
>>> display will happen “as fast as possible”. Raw Image Use Cases This
>set of
>>> use cases involves the manipulation of video after it comes from the
>>> camera, but before it goes out for transmission, or vice versa.
>Examples of
>>> apps that consume raw data from a camera or other source, producing
>>> data that goes out for processing: - Funny hats - Background removal
>>> In-browser compositing (merge video streams) Needed APIs: - Get raw
>>> from input device or path - Insert (processed) raw frames into
>>> device or path *
>>> This makes huge sense to me.
>>> It would make sense to mirror the capabilities of web audio here:
>>> - The API should be able to process any source (camera, peer
>>> canvas probably, meaning handling of potentially different frame
>>> - The API should be able to produce a source consumable by peer
>>> connection, video elements.
>>> - The API should allow to do as much processing (ideally the whole
>>> processing) off the main thread.
>>> - The API should allow leveraging existing APIs such as WASM,
>>> * Non-Standard Encoders This set of tools can be useful for either
>>> special types of operations (like detecting face movement and
>sending only
>>> those for backprojection on a model head rather than sending the
>picture of
>>> the face) or for testing out experimental codecs without involving
>>> changes (such as novel SVC or simulcast strategies). *
>>> Given the potential complexity here and below, compelling use cases
>>> really important to me.
>>> I am not sure experimental codecs meet the bar and require a
>standard API.
>>> An experiment can always be done using a proprietary API, available
>>> browser extensions for instance.
>>> As of special types of operation like detecting face movement, there
>>> might be alternatives using the raw image API:
>>> - Skip frames (say there is no head being detected)
>>> - Generate structured data (image descriptor eg.) and send it over
>>> channel
>>> - Transform an image before encoding/after decoding
>>> * Needed APIs, send side: - Get raw frames from input device -
>>> encoded frames on output transmission channel - Manipulate
>>> setup so that normal encoder resources are not needed Needed APIs,
>>> side: - Signalling access so that one knows what codec has been
>agreed for
>>> use - Get encoded frames from the input transmission channel -
>Insert raw
>>> (decoded) frames into output device or path Pre/post-transmission
>>> processing - Bring Your Own Encryption This is the inverse of the
>>> above: One has a video stream and wishes to encode it into a known
>>> but process the data further in some way before sending it. The
>example in
>>> the title is one use case. The same APIs will also allow the usage
>>> different transmission media (media over the data channel, or media
>>> protobufs over QUIC streams, for instance). *
>>> I like this BYO encryption use case.
>>> Note though that it does not specifically require to get access to
>>> encoded frames before doing the encryption.
>>> We could envision an API to provide the encryption parameters (keys
>>> so that the browser does the encryption by itself.
>>> Of course, it has pros (simple to implement, simple to use) and cons
>>> (narrow scope).
>>> I am not against adding support for scripting between encoding
>frames and
>>> sending the encoded frames.
>>> It seems like a powerful API.
>>> We must weight though how much ground we gain versus how much
>>> we add, how much we resolve actual needs of the community...
>> It should be noted that mobile platforms currently provide this level
>> access via MediaCodec (Android) and VideoToolbox (iOS). But I agree
>> having compelling use cases is important.
>>> Also to be noted that getting the encoded frames, processing them
>>> sending them to the network is currently done off the main thread.
>>> One general concern is that the more we add JavaScript at various
>>> of the pipeline, the more we might decrease the
>>> efficiency/stability/interoperability of the realtime pipeline.
>> These encoded frames are typically going to go over the network,
>> unexpected delays are a fact of life, and so the system is already
>> to deal with them, e.g., via jitter buffers. (Or, the frames will be
>> written to a file, and this issue is entirely moot.)
>> This is in contrast to the "raw image" cases, that will often be
>> in a performance-critical part of the pipeline.
>>> * Needed APIs, encode: - Codec configuration - the stuff that
>>> happens at offer/answer time - Getting the encoded frames from the
>>> channel - Inserting the processed encoded frames into the real
>>> channel - Reaction to congestion information from the output channel
>>> Feeding congestion signals into the encoder Needed APIs, decode: -
>>> configuration information - Getting the encoded frames from the
>>> transport - Inserting the processed encoded frames into the input
>>> process The same APIs are needed for other functions, such as: -
>>> Jitter buffer control in other ways than the built-in browser - This
>>> needs the ability to turn off the built-in jitter buffer, and
>>> makes this API have the same timing requirements as dealing with raw
>data -
>>> ML-FEC: Application-defined strategies for recovering from lost
>packets. -
>>> Alternative transmission: Using something other than browser’s
>>> realtime transport (currently SRTP) to move the media data *
>>> --
>>> Surveillance is pervasive. Go Dark.
>>> <Raw Data Access - Explainer.pdf>

Inviato dal mio dispositivo Android con K-9 Mail. Perdonate la brevità.
Received on Friday, 18 May 2018 21:55:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:18:41 UTC