Re: MediaRecorder and using Streams from Greg Billock on 2013-09-10 (public-media-capture@w3.org from September 2013)

From: Greg Billock <gbillock@google.com>
Date: Mon, 9 Sep 2013 21:00:15 -0700
To: "Mandyam, Giridhar" <mandyam@quicinc.com>
Cc: Rachel Blum <groby@chromium.org>, Jim Barnett <Jim.Barnett@genesyslab.com>, "Robert O'Callahan" <robert@ocallahan.org>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <CAAxVY9dMR4dbLiqhnGyivQK+gimsqe=JFJsM8b50TMFLcr6-VA@mail.gmail.com>
Stream is a work in progress, but eventually it should be possible to
attach a Stream directly where you can now attach a Blob, and things Just
Work. (If you read public-webapps, there's some feeling that Stream is what
we now wish Blob had been.)

My understanding is that the main advantage of MediaRecorder for speech
recognition is that the resulting stream is high-fidelity -- while adaptive
encoding may end up lowering fidelity dramatically to adapt to network
conditions, MediaRecorder should buffer in this case (although of course
not indefinitely), and is thus should have better properties for
non-real-time speech recognition apps (and this can be
pretty-near-real-time given good network conditions).

The proposal Rachel and I have made for using Stream is basically one of
data representation. The content of the encoded output would be the same,
but transmitted via Stream instead of sequence-of-Blobs. You'd read via the
Stream mechanisms in the File API. Stream is presumed to go the direction
that's been discussed in public-webapps, toward single-use.




On Mon, Sep 9, 2013 at 8:00 PM, Mandyam, Giridhar <mandyam@quicinc.com>wrote:

>  **Ø  **MediaRecorder and its API are _independent_ of the transport
> protocol chosen. ****
>
> ** **
>
> I do not disagree with you, but (in my recollection) the editors justified
> the introduction of the timeslice parameter based on the needs of RT speech
> recognition (see
> http://lists.w3.org/Archives/Public/public-media-capture/2012Nov/0076.html
> ).  Based on this justification, I think it is reasonable to examine if
> any proposed changes to the API have any impact on meeting this use case.  *
> *If** the group feels that it is not a good idea to leverage the
> MediaRecorder for RT streaming over WS or XHR (which are pretty much the
> primary  two mechanisms available to web developers for networked
> communications other than RTCpc), then we should justify the use of
> timeslicing in the recording API based on other use cases.****
>
> ** **
>
> **Ø  **Should you chose WS to transmit blobs, it is up to the receiving
> end to re-concatenate the blobs and extract meaningful data.****
>
> ** **
>
> I thought your proposed modification leveraged streams, not blobs.  Acc.
> to
> https://dvcs.w3.org/hg/streams-api/raw-file/tip/Overview.htm#stream-interface,
> a stream has unspecified length.  The current send() methods for ws do not
> seem to support data types of unspecified length (unless a Blob can have an
> unspecified size attribute).  ****
>
> ** **
>
> **Ø  **Should you chose WS to transmit blobs, it is up to the receiving
> end to re-concatenate the blobs and extract meaningful data.****
>
> ** **
>
> Right now, there is no direct way as far as I can tell in the WS API to
> transmit a stream object.  I was interested in what changes may need to
> happen to the WS API and WS protocol (if any) to make this possible.  This
> may be something that is simply not addressable until Streams API is more
> mature, if ever. ****
>
> ** **
>
> I actually agree with Greg Billock’s statement from a previous email:  "if you're using MediaRecorder to essentially write your own PeerConnection, you're doing it wrong." If there is no other compelling  use case for returning time sliced data, then I have trouble seeing the point in specifying a timeslice parameter on record().****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* groby@google.com [mailto:groby@google.com] *On Behalf Of *Rachel
> Blum
> *Sent:* Monday, September 09, 2013 4:09 PM
>
> *To:* Mandyam, Giridhar
> *Cc:* Jim Barnett; Robert O'Callahan; public-media-capture@w3.org
> *Subject:* Re: MediaRecorder and using Streams****
>
> ** **
>
> MediaRecorder and its API are _independent_ of the transport protocol
> chosen. ****
>
> ** **
>
> As far as I understand, MediaRecorder creates an encoded stream of bytes
> in a specific format. The Blob API will simply give a chunk of "bytes
> generated since last callback" to the consumer. The Streams API makes it
> clearer that the underlying data source is a stream of data, but doesn't
> change the behavior of MediaRecorder.****
>
> ** **
>
> Should you chose WS to transmit blobs, it is up to the receiving end to
> re-concatenate the blobs and extract meaningful data.****
>
> ** **
>
> If you want blobs to be truly self-contained, you need to call record()
> for each blob you are planning to send. (We've discussed extending the API
> to allow seamlessly stopping recording and restarting a new recording
> session, I think)****
>
> ** **
>
>  - rachel****
>
> ** **
>
> ** **
>
> ** **
>
> On Mon, Sep 9, 2013 at 4:03 PM, Mandyam, Giridhar <mandyam@quicinc.com>
> wrote:****
>
> If we use the latest version of WebSockets as an example for transport
> (see http://www.w3.org/TR/2011/WD-websockets-20110929/), currently all
> the data types using send() can be encapsulated in a the WS protocol (
> http://tools.ietf.org/html/rfc6455#section-5.6) – including a
> self-contained BLOB.  It is unclear to me how the Stream API would work in
> this situation (RTCPeerConnection is a different matter).  Do you have any
> insight as to how streaming would work over WS leveraging StreamBuilder?**
> **
>
>  ****
>
> Thanks,****
>
> -Giri****
>
>  ****
>
> *From:* groby@google.com [mailto:groby@google.com] *On Behalf Of *Rachel
> Blum
> *Sent:* Monday, September 09, 2013 4:02 PM
> *To:* Mandyam, Giridhar
> *Cc:* Jim Barnett; Robert O'Callahan; public-media-capture@w3.org****
>
>
> *Subject:* Re: MediaRecorder and using Streams****
>
>  ****
>
> There are more use cases than speech recognition :)****
>
>  ****
>
> As Jim pointed out, what matters for this specific use case though is that
> the format is streamable - even for speech recognition, the individual
> blobs are *not* as far as I understand guaranteed to be self-contained.
> I.e. taking "any given blob" might well result in useless data - hence the
> point that a MIME type on individual blobs is not meaningful. (and the
> discussion on p-m-c).****
>
>  ****
>
>  ****
>
> - rachel****
>
>  ****
>
>  ****
>
> On Mon, Sep 9, 2013 at 3:28 PM, Mandyam, Giridhar <mandyam@quicinc.com>
> wrote:****
>
> Regarding what is written on the blog under MIME type:****
>
>  ****
>
> “One of the issues with the current API discussed on public-media-capture
> was that of blob mime types.[3] Each blob should have a mime type, but it's
> not entirely clear what type it should be. The mime type of the final
> encoding result makes only sense for the entire data set, not for
> individual chunks of data.”****
>
>  ****
>
> Isn’t the point of returning time-sliced BLOB’s for the speech recognition
> use case that the BLOB’s be self-contained and have a MIME type associated
> with them?  For instance, assume that an implementation is returning a
> time-slice of encoded audio data every 500 ms.  If we also assume that the
> 500 ms chunk is being sent to a server somewhere for what is hopefully
> real-time processing, wouldn’t the server have to know the MIME type for
> each BLOB it receives?  The text above seems to indicate that the server
> would have to wait for the “final encoding result”, which would imply to me
> that RT speech recognition using the MediaRecorder is out if we have this
> interpretation.  And I don’t see how Stream API helps in this regards.****
>
>  ****
>
> My interpretation could be totally incorrect of course – wouldn’t be the
> first time.****
>
>  ****
>
> -Giri****
>
>  ****
>
> *From:* Jim Barnett [mailto:Jim.Barnett@genesyslab.com]
> *Sent:* Monday, September 09, 2013 3:18 PM
> *To:* Rachel Blum; Robert O'Callahan
> *Cc:* public-media-capture@w3.org
> *Subject:* RE: MediaRecorder and using Streams****
>
>  ****
>
> We can certainly look at this once the Streams API stabilizes.    The plan
> is for MediaRecorder to lag behind Media Capture (a.k.a. getUserMedia).
> We’re in a hurry to get gUM out, but there’s not so much of a rush for
> MediaRecorder.****
>
>  ****
>
> -          Jim****
>
>  ****
>
> *From:* groby@google.com [mailto:groby@google.com <groby@google.com>] *On
> Behalf Of *Rachel Blum
> *Sent:* Monday, September 09, 2013 6:05 PM
> *To:* Robert O'Callahan
> *Cc:* Jim Barnett; public-media-capture@w3.org
> *Subject:* Re: MediaRecorder and using Streams****
>
>  ****
>
> Just to keep the list informed: Greg and I have summarized the pros/cons
> we can see around MediaRecorder using a Streams API. (attached)****
>
>  ****
>
> Agreeing with Robert though that we shouldn't block MediaRecorder on
> Streams stabilizing - but it seems an idea worth exploring.****
>
>  ****
>
>  - rachel****
>
>  ****
>
> On Thu, Aug 1, 2013 at 2:57 PM, Robert O'Callahan <robert@ocallahan.org>
> wrote:****
>
> I think the basic idea makes sense but we can do it in addition to the
> Blob-only API, once the Streams situation has stabilized. I don't want to
> block MediaRecorder's streaming functionality on that.****
>
>
> Rob
> -- ****
>
> Jtehsauts  tshaei dS,o n" Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
> le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o  Whhei csha iids  teoa
> stiheer :p atroa lsyazye,d  'mYaonu,r  "sGients  uapr,e  tfaokreg iyvoeunr,
> 'm aotr  atnod  sgaoy ,h o'mGee.t"  uTph eann dt hwea lmka'n?  gBoutt  uIp
> waanndt  wyeonut  thoo mken.o w  ****
>
>  ****
>
>  ****
>
> ** **
>
Received on Tuesday, 10 September 2013 04:00:43 UTC