RE: Describing recording by means of the Media Source interface from Young, Milan on 2012-08-23 (public-media-capture@w3.org from August 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Thu, 23 Aug 2012 17:56:33 +0000
To: Rich Tibbett <richt@opera.com>
CC: Jim Barnett <Jim.Barnett@genesyslab.com>, Harald Alvestrand <harald@alvestrand.no>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A497D11@SOM-EXCH04.nuance.com>
> -----Original Message-----
> From: Rich Tibbett [mailto:richt@opera.com]> 
> Young, Milan wrote:
> > The Web Audio API only provides access to the raw samples, which presents
> a practical barrier to transmission.  Quality ASR requires encoding, which
> would be best performed in browser space.
> 
> I agree encoding would be an important consideration for data transmission
> purposes.
> 
> The natural place to have that discussion would be in relation to the Web
> Audio API though.

[Milan] I submitted a proposal a few months back to the Audio API group that they add an "encoding filter" to their pipeline that would provide access to the encoded representation.  Their response (I can find it if necessary) was that I should take the request to the Media Capture Task force :).

My opinion is that they are correct that recording interface developed by this group is the right placed for this functionality.


> 
> ..Or we can duplicate Web Audio API functionality in our own Media
> Recording API - albeit with encoding included - which strikes me as a
> duplication of effort across different W3C groups and a conflict of intended
> functionality for the Web Platform. It's always nice to do the work required
> only once.

[Milan] Fully agree.

> 
> >
> > As for other standards-based pipelines, I haven't found any.
> 
> As Josh pointed out, there is also a pipeline that allows the same thing for real-
> time access to MediaStream video byte-level data:

[Milan] My comment was directed at audio capture.  If there are alternatives that you know of, I would sincerely appreciate the reference.

Thanks


> 
> http://lists.w3.org/Archives/Public/public-media-capture/2012Aug/0109.html
> 
> >
> > Thanks
> >
> > -----Original Message-----
> > From: Rich Tibbett [mailto:richt@opera.com]
> > Sent: Thursday, August 23, 2012 8:46 AM
> > To: Jim Barnett
> > Cc: Harald Alvestrand; public-media-capture@w3.org
> > Subject: Re: Describing recording by means of the Media Source
> > interface
> >
> > Jim Barnett wrote:
> >> Rich,
> >>     One use case for real-time access to media data is speech recognition.
> >> We would like to be able to use media obtained  through getUserMedia
> >> to talk to an ASR system.  It would be nice if we could just set up a
> >> PeerConnection to the ASR system, but ASR engines don't handle UDP
> >> very well (they can handle delays, but not lost packets.)  So either
> >> we need to be able to set up a PeerConnection using TCP, or we need
> >> to give the app access to the audio in real time (and let it set up
> >> the TCP to the ASR engine.)
> >
> > How is this not possible with the following existing pipeline:
> >
> >
> >
> > MediaStream ->  HTMLAudioElement ->  Web Audio API [1] ->  WebSockets
> > ->  ASR Service
> >
> > ?
> >
> > By going through the Web Audio API [1] via an<audio>  element to obtain
> ongoing AudioBuffer data from a MediaStream object and then sending that
> on to a 3rd-party ASR engine via e.g. a WebSocket connection you could
> achieve the same thing.
> >
> > There may be other existing pipelines that could be used here too.
> >
> > - Rich
> >
> > [1]
> > https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#
> > AudioBuffer-section
> >
> >> - Jim
> >>
> >> -----Original Message-----
> >> From: Rich Tibbett [mailto:richt@opera.com]
> >> Sent: Thursday, August 23, 2012 10:56 AM
> >> To: Harald Alvestrand
> >> Cc: public-media-capture@w3.org
> >> Subject: Re: Describing recording by means of the Media Source
> >> interface
> >>
> >> Harald Alvestrand wrote:
> >>> I'm scanning the Media Source interface, and seeing how it describes
> >>> data formats for the buffers it uses.
> >>>
> >>> It seems to me logical to describe the recording interface in such a
> >>> way
> >>> that:
> >>>
> >>> If there exists a video stream v, a media source msrc and a media
> >>> stream ms, and (conceptually) msProducesData(buffer) is called every
> >>> time data is available at the recording interface, then the
> >>> following
> >>> code:
> >>>
> >>> // Setup
> >>> v.src = window.URL.createObjectURL(msrc); buffer =
> >>> msrc.addSourceBuffer(mimetype) // So far unknown setup for the
> >>> recorder interface
> >>>
> >>> // playback
> >>> msProducesData(data) {
> >>> buffer.append(data)
> >>> }
> >>>
> >>>
> >>> should produce the same display (possibly somewhat delayed due to
> >>> buffering) as
> >>>
> >>> v.src = window.URL.createObjectURL(ms)
> >>>
> >>> The media source definition is available here:
> >>>
> >>> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-sou
> >>> r
> >>> c
> >>> e.html
> >>>
> >>>
> >>> It seems to me that if we can make sure this actually works, we'll
> >>> have achieved a little consistency across the media handling platform.
> >>>
> >> I've been trying to figure out exactly the purpose of having access
> >> to _real-time_ buffer data via any type of MediaStream Recording API.
> >> It is fairly clear that byte-level access to recorded data could be
> >> solved with existing interfaces, albeit not in real-time as the media
> >> is being recorded to a file but once a file has already been recorded
> >> in its entirety and returned to the web app.
> >>
> >> If we could simply start recording of a MediaStream with e.g.
> >> .start(), then stop it at some arbitrary point, thereby returning a
> >> File object [1] then we could then pass that object through the
> >> existing FileReader API [2] to chunk it and apply anything we wish to
> >> at the byte-level after the recording has been completed.
> >>
> >> MediaStream content that is currently being recorded via a
> >> MediaRecorder API could be simultaneously displayed to the user in the
> >> browser via a<video>   or<audio>   tag so ongoing playback of a stream
> >> being recorded seems to be a problem that is already solved in most
> respects.
> >>
> >> If we could simply return a File object once recording has been
> >> stopped then we've saved an exception amount of complexity from
> >> MediaStream recording (by not having to implement media buffers for
> >> ongoing recording data and not having to rely on the draft Stream API
> >> proposal which offers a lot of the functionality already available in
> >> FileReader
> >> - albeit in real-time).
> >>
> >> We wouldn't lose any of the ability to subsequently apply any
> >> modifications at the byte-level (via FileReader) - just that we
> >> wouldn't have real-time access to ongoing media recording data.
> >>
> >> I could live with this - unless there are some compelling use cases
> >> for reading ongoing MediaStream data in real-time as opposed to
> >> simply being able to read that data once it has already been
> >> collected in to a recording, in its entirety.
> >>
> >> Any use cases brought forward here requiring real-time access to
> >> ongoing recorded byte-data would be welcome. Otherwise, I'm in favor
> >> of greatly reducing the complexity involved with recording a MediaStream.
> >>
> >> [1] http://www.w3.org/TR/FileAPI/#dfn-file
> >>
> >> [2] http://www.w3.org/TR/FileAPI/#dfn-filereader
> >>
> >> --
> >> Rich Tibbett (richt)
> >> CORE Platform Architect - Opera Software ASA
> >>
Received on Thursday, 23 August 2012 17:57:06 UTC