Re: To Stream or not to Stream from Harald Alvestrand on 2018-06-14 (public-webrtc@w3.org from June 2018)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Thu, 14 Jun 2018 15:58:28 +0200
To: public-webrtc@w3.org
Message-ID: <3c6905ce-0f2c-3e86-5d3a-7ea243b4b1b6@alvestrand.no>
Den 14. juni 2018 15:10, skrev Sergio Garcia Murillo:
> Hi Harald,
> 
> Fair points. Let's get a few steps back.
> 
> I think we have quite a consensus that webrtc nv should provide lower
> level components than the current ones in WebRTC/ORTC, with a more fine
> grained control. In order to decide what components will be required to
> implement we have gathered the uses cases we want to support and
> couldn't be done with current APIS (at least easily).
> 
> It also seems to be a consensus that we should split the current
> sender/receivers into encoders/decoders and transports and due to some
> of the use cases, we have jumped into deciding that we need raw access
> to media frames and rtp packets, and jumped even further into requesting
> and providing API proposals about how to implement that.

Actually I see the requirements for raw access coming from a different
set of use cases than the ones that need "only" lower level components.
Face recognition in video (or voice recognition in audio for that
matter) can't be done without access to raw data, and
bring-your-own-encryption can't be done without access to the packets -
other use cases speak to other concerns.

> 
> In his encoder/decoders proposal, Peter described a
> "track-in/frame-out", which later we extended to
> "media-frame-or-track-in/frame-out" model. Also, on the transport he
> proposed direct raw rtp and rtcp access. I am challenging theese models
> and direct accesses, not proposing an alternative API yet.
> 
> In this regards, I am advocating for an API model that doesn't require
> to process the individual frames  (rtp/rtcp/media-frames) while still
> allowing this individual frame manipulation to some extend. I don't
> really care if it is in a source-sink model or stream-like model, as
> long as it is easy to setup as in the example (consider it as
> pseudocode), and all standard components are provided by the browser to
> ensure compatibility.

I

> The whatwg streams was just an example of a kind of API that potentially
> could implement that model and provide the required functionality to
> cover all the use cases (including individual frame manipilation). We
> may found that it has technical restrictions which makes it not viable,
> or decide that we like better a source-shink approach, if that is the
> API model we decide to follow.

That's what bothers me with it - the whatwg streams isn't "just an
example" - it's a specific set of tools, with a specific set of
properties - and that's part of what makes it attractive.

Then I need to know enough to make a proposal out of it - and that
doesn't require "an example of a kind of API" - that requires a proposal.

> 
> IMHO the questions we have to answer are what components do we need, how
> do we link them together and how much low-level is enough, but, as good
> technicians we have jumped too soon into "show-me-the-code" mode ;)

That seems to me that you're singing from the same page as Peter, when
he says "wasting time on streams vs non-streams is useless", which is
exactly the opposite viewpoint that I read out of your earlier message.

Thanks for clarifying - I was confused!


> 
> Best regards
> Sergio
> 
> On 14/06/2018 13:55, Harald Alvestrand wrote:
>> Part of my frustration with the streams discussion is that the people
>> saying "use Streams" haven't been able to tell me exactly what they mean
>> when they say that.
>>
>> Part of it is my lack of understanding - until a month or two ago, I
>> thought streams were still byte-streams only, but now it seems that they
>> have finally gotten around to passing objects between them, and with the
>> advent of the TransformStream, there's explicit acknowledgement that
>> processing using a stream model can cause different things to come out
>> than what comes in.
>>
>> But when Sergio says something like this:
>>
>>> Using the whatwg-like api, it could be possible to do
>>>
>>> source.pipeThrough(funnyHatsWebWorker)
>>>              .pipeTo(encoder)
>>>              .pipeThrough(rtpPacketizer)
>>>              .pipeTo(rtpSender)
>>>              .pipeTo(rtpTransport)
>> I don't know what I'm seeing, and I have dozens of questions that I
>> don't know where to go to answer.
>>
>> Back in the Dawn of Time, we had two possible models of how we wired
>> things together: Explicit links (like MediaStream{track}) or implicit
>> links (like source-to-sink connections in WebAudio). We chose the
>> explicit-link model, and made the links into control surfaces, with
>> functions like ApplyConstraints.
>>
>> Now, with Streams, I'm not sure if I'm looking at source-to-sink
>> couplings (where all the controls are on the sources and the sinks) or
>> explicit-link objects (where there are controls on the connections). So
>> before I can understand that, I need a proposal in front of me that
>> actually calls out these things - and so far, none of the comments I've
>> seen from people who claim to like streams have contained enough
>> information for me to build one.
>>
>> In the seemingly simple example above, I can assume that each object
>> that is mentioned in "pipeThrough()" implements the TransformStream
>> interface, which consists (effectively) of getting a WritableStream and
>> a ReadableStream. (But the inline .pipeTo confuses me, since .pipeTo
>> seems to return a promise that resolves when the stream terminates -
>> should they have been .pipeThrough also?)
>>
>> So there's backpressure travelling up the chain - how is this handled?
>> Just using "available buffer size", which is what
>> WritableStreamDefaultWriterGetDesiredSize seems to be describing in the
>> spec, isn't appropriate for video, because we want the rate of the
>> encoder (4 steps back the chain) to be adjusted to a lower number, not
>> just doing a "stop/go" signal. We could imagine lots of solutions,
>> including having the encoder take the transport as a parameter so that
>> it knows what it's encoding for - but if intermediate steps of the chain
>> take actions that invalidate the assumptions (like throwing away frames)
>> - what happens?
>>
>> I would like to see a proposal for using streams. But:
>>
>> a) I know I haven't seen one
>>
>> b) like Peter, I think we can make a lot of decisions without answering
>> this one
>>
>> c) I don't know how to make one.
>>
>>
>> That's the trouble I have with Streams at the moment.
>>
>>
>>
>>
> 
>
Received on Thursday, 14 June 2018 13:58:56 UTC