Re: Use cases / requirements for raw data access functions from Lorenzo Miniero on 2018-05-18 (public-webrtc@w3.org from May 2018)

From: Lorenzo Miniero <lorenzo@meetecho.com>
Date: Fri, 18 May 2018 23:54:05 +0200
To: Peter Thatcher <pthatcher@google.com>,Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
CC: public-webrtc@w3.org
Message-ID: <DB03B601-4ED5-4E30-A22B-E63689B55057@meetecho.com>
Il 18 maggio 2018 23:33:49 CEST, Peter Thatcher <pthatcher@google.com> ha scritto:
>On Fri, May 18, 2018 at 1:28 AM Sergio Garcia Murillo <
>sergio.garcia.murillo@gmail.com> wrote:
>
>> IMHO we (me first) are not providing use cases, but
>> features/functionalities we like to bring into the API, which is
>fine, but
>> I think we are overlooking one fact: QUIC and RTP have one
>fundamental
>> difference, udp fragmentation/packetization.
>>
>> While in QUIC the packetization is codec agnostic and performed deep
>into
>> the stack, in RTP, the packetization is codec dependent and done
>before
>> reaching the rtp stack.
>>
>> Why I am bringing that topic in? Because I feel that some use
>> cases/features, for example raw access to encoded frames or the bring
>your
>> own crypto makes a lot of sense for QUIC (where you just need to pass
>the
>> raw binary data as a whole) but much less sense for RTP.
>>
>>
>It makes just as much sense for RTP.  Having access to encoded frames
>before RTP packetization is useful for e2ee for RTP as well.  And if
>the
>RTP transport API is low-level enough, it would be just as easy to add
>arbitrary metadata (another use case mentioned) as it is for QUIC.
>
>Don't take me wrong, I am not against that use cases at all (I like
>them),
>> but as I see it, we should consider QUIC as an alternative to
>DataChannels
>> from an API/use case point of view, and not as an alternative to RTP.
>>
>
>QUIC is a transport, not a replacement for RTP.  But you can build a
>replacement for RTP on top of QUIC (or an top of SCTP, for that
>matter).
>Just as you could make an RTP data channel and build a replacement for
>RTP
>on top of RTP.
>
>We should separate transports from encoders (split the RtpSender in
>half)
>to give more flexibility to apps.
>
>
>My worries are that we try to create an API that cover those use cases
>that
>> works for both QUIC and RTP, which will create an awful experience
>for
>> those willing to use RTP, or even worse, not even consider the RTP
>> specifics at all (as you already have the "raw" functionality and you
>can
>> implement that on your own on your javascript app), and RTP becoming
>a
>> second class citizen on webrtc.
>>
>
>How would splitting an RtpSender into an encoder and transport be an
>awful
>experience?  You can do everything you can currently do, just more.
>
>I don't think this is choice between RTP, QUIC, and SCTP.  We can make
>them
>all work.    But the first step is decoupling encoder/decoders,
>transports,
>and ICE.  Then apps/developers can assemble the parts they want the way
>they want.
>
>
>But, as you say, let's start with use cases, not solutions.
>
>By the way, this is my use case: as a web/mobile developer, I want to
>send
>media over QUIC for my own replacement for RTP.  QUIC isn't the
>replacement
>for RTP, but my own protocol on top of QUIC (or SCTP, or any data
>channel)
>is the replacement for RTP.    Luckily, this is "free" as soon as you
>add
>QUIC data channels and split the RtpSender into encoder and transport.
>That's all I need.
>


Farewell interoperability...

L.


>
>
>
>> Best regards
>>
>> Sergio
>>
>>
>>
>>
>> On 18/05/2018 7:17, Justin Uberti wrote:
>>
>> On Thu, May 17, 2018 at 10:20 AM youenn fablet <yfablet@apple.com>
>wrote:
>>
>>
>>> On May 16, 2018, at 10:06 AM, Harald Alvestrand
><harald@alvestrand.no>
>>> wrote:
>>>
>>> *This is a copy of a document we've been working on in order to
>collect
>>> thoughts about the need for new APIs in WebRTC "TNG".*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> * It should bring out certain requirements that make some API
>proposals
>>> obvious (or not). Please comment! (PDF version attached, so that the
>>> picture survives) Certain things are hard to do in the present
>WebRTC /
>>> MediaStreamTrack API. In particular, anything involving manipulation
>of raw
>>> data involves convoluted interfaces that impose burdens of format
>>> conversion and/or buffer copying on the user. This document sketches
>the
>>> use cases that can be made possible if this access is made a lot
>easier and
>>> with lower overhead. For reference, a model of the encoding /
>decoding
>>> pipeline in the communications use case: When doing other types of
>>> processing, the pipeline stages may be connected elsewhere; for
>instance,
>>> when saving to file (MediaRecorder), the “Encode” step links to a
>“Storage”
>>> step, not “Transport”. The “Decode” process will include alignment
>of media
>>> timing with real time (NetEq / jitter buffer); the process from raw
>data to
>>> display will happen “as fast as possible”. Raw Image Use Cases This
>set of
>>> use cases involves the manipulation of video after it comes from the
>>> camera, but before it goes out for transmission, or vice versa.
>Examples of
>>> apps that consume raw data from a camera or other source, producing
>raw
>>> data that goes out for processing: - Funny hats - Background removal
>-
>>> In-browser compositing (merge video streams) Needed APIs: - Get raw
>frames
>>> from input device or path - Insert (processed) raw frames into
>output
>>> device or path *
>>>
>>>
>>>
>>> This makes huge sense to me.
>>> It would make sense to mirror the capabilities of web audio here:
>>> - The API should be able to process any source (camera, peer
>connection,
>>> canvas probably, meaning handling of potentially different frame
>formats)
>>> - The API should be able to produce a source consumable by peer
>>> connection, video elements.
>>> - The API should allow to do as much processing (ideally the whole
>>> processing) off the main thread.
>>> - The API should allow leveraging existing APIs such as WASM,
>WebGL...
>>>
>>>
>>>
>>>
>>> * Non-Standard Encoders This set of tools can be useful for either
>>> special types of operations (like detecting face movement and
>sending only
>>> those for backprojection on a model head rather than sending the
>picture of
>>> the face) or for testing out experimental codecs without involving
>browser
>>> changes (such as novel SVC or simulcast strategies). *
>>>
>>>
>>> Given the potential complexity here and below, compelling use cases
>seem
>>> really important to me.
>>> I am not sure experimental codecs meet the bar and require a
>standard API.
>>> An experiment can always be done using a proprietary API, available
>to
>>> browser extensions for instance.
>>>
>>> As of special types of operation like detecting face movement, there
>>> might be alternatives using the raw image API:
>>> - Skip frames (say there is no head being detected)
>>> - Generate structured data (image descriptor eg.) and send it over
>data
>>> channel
>>> - Transform an image before encoding/after decoding
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> * Needed APIs, send side: - Get raw frames from input device -
>Insert
>>> encoded frames on output transmission channel - Manipulate
>transmission
>>> setup so that normal encoder resources are not needed Needed APIs,
>receive
>>> side: - Signalling access so that one knows what codec has been
>agreed for
>>> use - Get encoded frames from the input transmission channel -
>Insert raw
>>> (decoded) frames into output device or path Pre/post-transmission
>>> processing - Bring Your Own Encryption This is the inverse of the
>situation
>>> above: One has a video stream and wishes to encode it into a known
>codec,
>>> but process the data further in some way before sending it. The
>example in
>>> the title is one use case. The same APIs will also allow the usage
>of
>>> different transmission media (media over the data channel, or media
>over
>>> protobufs over QUIC streams, for instance). *
>>>
>>>
>>> I like this BYO encryption use case.
>>> Note though that it does not specifically require to get access to
>the
>>> encoded frames before doing the encryption.
>>> We could envision an API to provide the encryption parameters (keys
>e.g.)
>>> so that the browser does the encryption by itself.
>>> Of course, it has pros (simple to implement, simple to use) and cons
>>> (narrow scope).
>>>
>>> I am not against adding support for scripting between encoding
>frames and
>>> sending the encoded frames.
>>> It seems like a powerful API.
>>> We must weight though how much ground we gain versus how much
>complexity
>>> we add, how much we resolve actual needs of the community...
>>>
>>
>> It should be noted that mobile platforms currently provide this level
>of
>> access via MediaCodec (Android) and VideoToolbox (iOS). But I agree
>that
>> having compelling use cases is important.
>>
>>>
>>> Also to be noted that getting the encoded frames, processing them
>and
>>> sending them to the network is currently done off the main thread.
>>> One general concern is that the more we add JavaScript at various
>points
>>> of the pipeline, the more we might decrease the
>>> efficiency/stability/interoperability of the realtime pipeline.
>>>
>>
>> These encoded frames are typically going to go over the network,
>where
>> unexpected delays are a fact of life, and so the system is already
>prepared
>> to deal with them, e.g., via jitter buffers. (Or, the frames will be
>> written to a file, and this issue is entirely moot.)
>>
>> This is in contrast to the "raw image" cases, that will often be
>operating
>> in a performance-critical part of the pipeline.
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> * Needed APIs, encode: - Codec configuration - the stuff that
>usually
>>> happens at offer/answer time - Getting the encoded frames from the
>“output”
>>> channel - Inserting the processed encoded frames into the real
>“output”
>>> channel - Reaction to congestion information from the output channel
>-
>>> Feeding congestion signals into the encoder Needed APIs, decode: -
>Codec
>>> configuration information - Getting the encoded frames from the
>input
>>> transport - Inserting the processed encoded frames into the input
>decoding
>>> process The same APIs are needed for other functions, such as: -
>ML-NetEq:
>>> Jitter buffer control in other ways than the built-in browser - This
>also
>>> needs the ability to turn off the built-in jitter buffer, and
>therefore
>>> makes this API have the same timing requirements as dealing with raw
>data -
>>> ML-FEC: Application-defined strategies for recovering from lost
>packets. -
>>> Alternative transmission: Using something other than browser’s
>built-in
>>> realtime transport (currently SRTP) to move the media data *
>>>
>>> --
>>> Surveillance is pervasive. Go Dark.
>>>
>>> <Raw Data Access - Explainer.pdf>
>>>
>>>
>>>

-- 
Inviato dal mio dispositivo Android con K-9 Mail. Perdonate la brevità.
Received on Friday, 18 May 2018 21:55:15 UTC