Re: To Stream or not to Stream from Sergio Garcia Murillo on 2018-06-14 (public-webrtc@w3.org from June 2018)

From: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
Date: Thu, 14 Jun 2018 15:10:11 +0200
To: public-webrtc@w3.org
Message-ID: <c1db114c-a66b-a7df-c608-2bd11a54df0e@gmail.com>
Hi Harald,

Fair points. Let's get a few steps back.

I think we have quite a consensus that webrtc nv should provide lower 
level components than the current ones in WebRTC/ORTC, with a more fine 
grained control. In order to decide what components will be required to 
implement we have gathered the uses cases we want to support and 
couldn't be done with current APIS (at least easily).

It also seems to be a consensus that we should split the current 
sender/receivers into encoders/decoders and transports and due to some 
of the use cases, we have jumped into deciding that we need raw access 
to media frames and rtp packets, and jumped even further into requesting 
and providing API proposals about how to implement that.

In his encoder/decoders proposal, Peter described a 
"track-in/frame-out", which later we extended to 
"media-frame-or-track-in/frame-out" model. Also, on the transport he 
proposed direct raw rtp and rtcp access. I am challenging theese models 
and direct accesses, not proposing an alternative API yet.

In this regards, I am advocating for an API model that doesn't require 
to process the individual frames  (rtp/rtcp/media-frames) while still 
allowing this individual frame manipulation to some extend. I don't 
really care if it is in a source-sink model or stream-like model, as 
long as it is easy to setup as in the example (consider it as 
pseudocode), and all standard components are provided by the browser to 
ensure compatibility.

The whatwg streams was just an example of a kind of API that potentially 
could implement that model and provide the required functionality to 
cover all the use cases (including individual frame manipilation). We 
may found that it has technical restrictions which makes it not viable, 
or decide that we like better a source-shink approach, if that is the 
API model we decide to follow.

IMHO the questions we have to answer are what components do we need, how 
do we link them together and how much low-level is enough, but, as good 
technicians we have jumped too soon into "show-me-the-code" mode ;)

Best regards
Sergio

On 14/06/2018 13:55, Harald Alvestrand wrote:
> Part of my frustration with the streams discussion is that the people
> saying "use Streams" haven't been able to tell me exactly what they mean
> when they say that.
>
> Part of it is my lack of understanding - until a month or two ago, I
> thought streams were still byte-streams only, but now it seems that they
> have finally gotten around to passing objects between them, and with the
> advent of the TransformStream, there's explicit acknowledgement that
> processing using a stream model can cause different things to come out
> than what comes in.
>
> But when Sergio says something like this:
>
>> Using the whatwg-like api, it could be possible to do
>>
>> source.pipeThrough(funnyHatsWebWorker)
>>              .pipeTo(encoder)
>>              .pipeThrough(rtpPacketizer)
>>              .pipeTo(rtpSender)
>>              .pipeTo(rtpTransport)
> I don't know what I'm seeing, and I have dozens of questions that I
> don't know where to go to answer.
>
> Back in the Dawn of Time, we had two possible models of how we wired
> things together: Explicit links (like MediaStream{track}) or implicit
> links (like source-to-sink connections in WebAudio). We chose the
> explicit-link model, and made the links into control surfaces, with
> functions like ApplyConstraints.
>
> Now, with Streams, I'm not sure if I'm looking at source-to-sink
> couplings (where all the controls are on the sources and the sinks) or
> explicit-link objects (where there are controls on the connections). So
> before I can understand that, I need a proposal in front of me that
> actually calls out these things - and so far, none of the comments I've
> seen from people who claim to like streams have contained enough
> information for me to build one.
>
> In the seemingly simple example above, I can assume that each object
> that is mentioned in "pipeThrough()" implements the TransformStream
> interface, which consists (effectively) of getting a WritableStream and
> a ReadableStream. (But the inline .pipeTo confuses me, since .pipeTo
> seems to return a promise that resolves when the stream terminates -
> should they have been .pipeThrough also?)
>
> So there's backpressure travelling up the chain - how is this handled?
> Just using "available buffer size", which is what
> WritableStreamDefaultWriterGetDesiredSize seems to be describing in the
> spec, isn't appropriate for video, because we want the rate of the
> encoder (4 steps back the chain) to be adjusted to a lower number, not
> just doing a "stop/go" signal. We could imagine lots of solutions,
> including having the encoder take the transport as a parameter so that
> it knows what it's encoding for - but if intermediate steps of the chain
> take actions that invalidate the assumptions (like throwing away frames)
> - what happens?
>
> I would like to see a proposal for using streams. But:
>
> a) I know I haven't seen one
>
> b) like Peter, I think we can make a lot of decisions without answering
> this one
>
> c) I don't know how to make one.
>
>
> That's the trouble I have with Streams at the moment.
>
>
>
>
Received on Thursday, 14 June 2018 13:09:36 UTC