Re: Use cases / requirements for raw data access functions from Göran Eriksson AP on 2018-06-06 (public-webrtc@w3.org from June 2018)

From: Göran Eriksson AP <goran.ap.eriksson@ericsson.com>
Date: Wed, 6 Jun 2018 11:19:53 +0000
To: Peter Thatcher <pthatcher@google.com>, Harald Alvestrand <harald@alvestrand.no>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <B5B5BF04-BCAD-482E-8E4A-97815CD99D83@ericsson.com>

I have a few:

- Terminating media (audio, video, data) on a server is a pain in the neck with DTLS/RTP/RTCP/SCTP. I would like that to be much easier. Sending a serialized frame object (say, CBOR or protobuf) over a QUIC stream is way easier to terminate.
- Including real-time data that's synchronized with audio and data, or including metadata about a particular audio for video frame is much easier if I can just attach it to a serialized frame object as mentioned above.
- Taking more directly control over stream control signaling (mute state, key frame requests, end-of-stream), etc is much easier if the control plane is controlled by the application and can be integrated with the flow of media, unlike today with RTCP (not under control of app) or data channels (not integrated with flow of media).
- e2ee without the complexities of double SRTP.

All of these have been brought up as use cases already by developers in responses to Sergio's survey https://docs.google.com/forms/d/1YVKqVU_ziCYtp8RGGnwB8WcQWDhkXe-mOmaSkFTdJm8/viewanalytics).

More generally, RTP is needlessly complex. It's hard to add things, it's hard to change things, it's hard to debug things, and it's hard to understand things. So in a sense, my use case is "cause me less complexity and pain"

Hi,

Before we close the use case discussion, I want to +1 Peter's use cases and thoughts on (partly) custom protocol suite on QUIC (which is designed to fit a server side not leveraging RTP btw);

1) I would like to inject my own object detection in the media stream on the raw data before an encoder, said object detection guiding the encoder as well as adding metadata intended for my server side worker, either inline in CBOR or offline on a data channel, using a x-on-QUIC transport or grpc/http. I may add another data processing function before going to the transport endpoint, e.g. for encryption/integrity protection/data set tagging.

All of the three custom functions above, object detection, encoder, encryption, would preferably be able to use Web Assembly, each accelerated by WebGL.

Apart from the changes needed to redirect the media stream(s) to various functions, a handle to the chain of functions could come handy.

2) In some cases, I also see a need to clone the media stream(s) and perform processing in parallell in the two groups of streams. This is possible now in the API but just wanted to add it make the picture of what the use case could mean for the API surface evolution.

Yeah- I know, the above stuff may not be there Day 1 but just to give something to discuss. The design pattern is a pretty straight forward machine learning one, quite possible in some device types, if not all.

I think enabling realtime ML'in in javascript' with access to hw acceleration for web site custom ML processing functions and custom designed protocols on QUIC data channel is a nice playground for future cool smart web sites, :-).

I also expect the security considerations that come with all this “AI” stuff in web apps, will be discussed at the NV.

Regards

Göran

Received on Wednesday, 6 June 2018 11:20:33 UTC