- From: Young, Milan <Milan.Young@nuance.com>
- Date: Thu, 23 Aug 2012 16:53:05 +0000
- To: Rich Tibbett <richt@opera.com>, Jim Barnett <Jim.Barnett@genesyslab.com>
- CC: Harald Alvestrand <harald@alvestrand.no>, "public-media-capture@w3.org" <public-media-capture@w3.org>
The Web Audio API only provides access to the raw samples, which presents a practical barrier to transmission. Quality ASR requires encoding, which would be best performed in browser space. As for other standards-based pipelines, I haven't found any. Thanks -----Original Message----- From: Rich Tibbett [mailto:richt@opera.com] Sent: Thursday, August 23, 2012 8:46 AM To: Jim Barnett Cc: Harald Alvestrand; public-media-capture@w3.org Subject: Re: Describing recording by means of the Media Source interface Jim Barnett wrote: > Rich, > One use case for real-time access to media data is speech recognition. > We would like to be able to use media obtained through getUserMedia > to talk to an ASR system. It would be nice if we could just set up a > PeerConnection to the ASR system, but ASR engines don't handle UDP > very well (they can handle delays, but not lost packets.) So either > we need to be able to set up a PeerConnection using TCP, or we need to > give the app access to the audio in real time (and let it set up the > TCP to the ASR engine.) How is this not possible with the following existing pipeline: MediaStream -> HTMLAudioElement -> Web Audio API [1] -> WebSockets -> ASR Service ? By going through the Web Audio API [1] via an <audio> element to obtain ongoing AudioBuffer data from a MediaStream object and then sending that on to a 3rd-party ASR engine via e.g. a WebSocket connection you could achieve the same thing. There may be other existing pipelines that could be used here too. - Rich [1] https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioBuffer-section > > - Jim > > -----Original Message----- > From: Rich Tibbett [mailto:richt@opera.com] > Sent: Thursday, August 23, 2012 10:56 AM > To: Harald Alvestrand > Cc: public-media-capture@w3.org > Subject: Re: Describing recording by means of the Media Source > interface > > Harald Alvestrand wrote: >> I'm scanning the Media Source interface, and seeing how it describes >> data formats for the buffers it uses. >> >> It seems to me logical to describe the recording interface in such a >> way >> that: >> >> If there exists a video stream v, a media source msrc and a media >> stream ms, and (conceptually) msProducesData(buffer) is called every >> time data is available at the recording interface, then the following >> code: >> >> // Setup >> v.src = window.URL.createObjectURL(msrc); buffer = >> msrc.addSourceBuffer(mimetype) // So far unknown setup for the >> recorder interface >> >> // playback >> msProducesData(data) { >> buffer.append(data) >> } >> >> >> should produce the same display (possibly somewhat delayed due to >> buffering) as >> >> v.src = window.URL.createObjectURL(ms) >> >> The media source definition is available here: >> >> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-sour >> c >> e.html >> >> >> It seems to me that if we can make sure this actually works, we'll >> have achieved a little consistency across the media handling platform. >> > > I've been trying to figure out exactly the purpose of having access to > _real-time_ buffer data via any type of MediaStream Recording API. It > is fairly clear that byte-level access to recorded data could be > solved with existing interfaces, albeit not in real-time as the media > is being recorded to a file but once a file has already been recorded > in its entirety and returned to the web app. > > If we could simply start recording of a MediaStream with e.g. > .start(), then stop it at some arbitrary point, thereby returning a > File object [1] then we could then pass that object through the > existing FileReader API [2] to chunk it and apply anything we wish to > at the byte-level after the recording has been completed. > > MediaStream content that is currently being recorded via a > MediaRecorder API could be simultaneously displayed to the user in the > browser via a <video> or<audio> tag so ongoing playback of a stream > being recorded seems to be a problem that is already solved in most respects. > > If we could simply return a File object once recording has been > stopped then we've saved an exception amount of complexity from > MediaStream recording (by not having to implement media buffers for > ongoing recording data and not having to rely on the draft Stream API > proposal which offers a lot of the functionality already available in > FileReader > - albeit in real-time). > > We wouldn't lose any of the ability to subsequently apply any > modifications at the byte-level (via FileReader) - just that we > wouldn't have real-time access to ongoing media recording data. > > I could live with this - unless there are some compelling use cases > for reading ongoing MediaStream data in real-time as opposed to simply > being able to read that data once it has already been collected in to > a recording, in its entirety. > > Any use cases brought forward here requiring real-time access to > ongoing recorded byte-data would be welcome. Otherwise, I'm in favor > of greatly reducing the complexity involved with recording a MediaStream. > > [1] http://www.w3.org/TR/FileAPI/#dfn-file > > [2] http://www.w3.org/TR/FileAPI/#dfn-filereader > > -- > Rich Tibbett (richt) > CORE Platform Architect - Opera Software ASA >
Received on Thursday, 23 August 2012 16:53:33 UTC