Re: Use cases / requirements for raw data access functions from Sergio Garcia Murillo on 2018-05-21 (public-webrtc@w3.org from May 2018)

From: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
Date: Mon, 21 May 2018 23:39:09 +0200
To: public-webrtc@w3.org
Message-ID: <c8d8664e-8d56-0f13-5173-7a7ea06106f9@gmail.com>
On 21/05/2018 23:10, Peter Thatcher wrote:
> On Mon, May 21, 2018 at 1:54 PM Harald Alvestrand 
> <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote:
>
>     On 05/21/2018 08:35 PM, Peter Thatcher wrote:
>>     On Sun, May 20, 2018 at 9:08 PM Harald Alvestrand
>>     <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote:
>>
>>         On 05/18/2018 11:33 PM, Peter Thatcher wrote:
>>         > By the way, this is my use case: as a web/mobile developer,
>>         I want to
>>         > send media over QUIC for my own replacement for RTP.  QUIC
>>         isn't the
>>         > replacement for RTP, but my own protocol on top of QUIC (or
>>         SCTP, or
>>         > any data channel) is the replacement for RTP.    Luckily,
>>         this is
>>         > "free" as soon as you add QUIC data channels and split the
>>         RtpSender
>>         > into encoder and transport.  That's all I need.
>>         >
>>         Peter,
>>
>>         since I'm asking this of others:
>>
>>         can you please expand this into an use case - describe a
>>         function or
>>         application that is possible / easy when you have that
>>         functionality
>>         available, and is impossible / hard to do today?
>>
>>
>>     I have a few:
>>
>>     - Terminating media (audio, video, data) on a server is a pain in
>>     the neck with DTLS/RTP/RTCP/SCTP.   I would like that to be much
>>     easier.  Sending a serialized frame object (say, CBOR or
>>     protobuf) over a QUIC stream is way easier to terminate.
>
>     Pushing it up one more layer: You want an use case where you send
>     media to a server, and because the server architecture is such
>     that DTLS/RTP/RTCP/SCTP is a pain to terminate, and you can't
>     change the server architecture, you want a different wrapping of
>     the data?
>
>
> Yep.

You can do it also with websocket easily, so imho the interesting part 
of this use case is not how you would send the encoded frame, but the 
audio/video encoder part. In that regard we have to take care an check 
what is already specified in MSE/EME as we may several elements in 
common, or very close.

>
>>     - Including real-time data that's synchronized with audio and
>>     data, or including metadata about a particular audio for video
>>     frame is much easier if I can just attach it to a serialized
>>     frame object as mentioned above.
>
>     I can think of some example use cases here - subtitles are one of
>     them. Do you want to expand with some more specific ones?
>
>
> Some ideas:
>
> - Something like "this frame is interesting" computed from the 
> sender.  Could be a reason to do something special on a server (switch 
> users in a conference, start recording, ...).
> - Attaching a replacement for video onto audio frames, like info 
> necessary to draw an avatar.
> -  An SFU provides metadata about the person speaking when it switches 
> in a virtual stream of the current speaker.

In IOT is quite common to have telemetry that needs to be in sync with 
the media. Note that in this use case the metadata should be attached to 
the media frame before being encoded not after.

>
>     - Taking more directly control over stream control signaling (mute
>     state, key frame requests, end-of-stream), etc is much easier if
>     the control plane is controlled by the application and can be
>     integrated with the flow of media, unlike today with RTCP (not
>     under control of app) or data channels (not integrated with flow
>     of media).
>     What's the application that would need more direct control over
>     the stream's state?
>
>
> An SFU might want to say to a receiver "there is no audio here right 
> now; no comfort noise, no nothing; don't mix it until you hear 
> differently from me" instead of having a jitter buffer actively mixing 
> silence for all silent streams.

RTCP PAUSED indication. I have tried a couple of times to get it 
supported already.

>     Is this a requirement that would equally well be satisified by a
>     direct API to the RTCP?
>
>
> Yes, especially if new RTCP messages types can be added.

The only benefit of using RTCP is interoperability.  I don't see any 
benefit of adding any custom app RTCP messages compared to send an app 
message via DC or adding metadata to the media frames.

Best regards
Sergio
Received on Monday, 21 May 2018 21:38:53 UTC