- From: Young, Milan <Milan.Young@nuance.com>
- Date: Thu, 23 Aug 2012 16:53:05 +0000
- To: Rich Tibbett <richt@opera.com>, Jim Barnett <Jim.Barnett@genesyslab.com>
- CC: Harald Alvestrand <harald@alvestrand.no>, "public-media-capture@w3.org" <public-media-capture@w3.org>
The Web Audio API only provides access to the raw samples, which presents a practical barrier to transmission. Quality ASR requires encoding, which would be best performed in browser space.
As for other standards-based pipelines, I haven't found any.
Thanks
-----Original Message-----
From: Rich Tibbett [mailto:richt@opera.com]
Sent: Thursday, August 23, 2012 8:46 AM
To: Jim Barnett
Cc: Harald Alvestrand; public-media-capture@w3.org
Subject: Re: Describing recording by means of the Media Source interface
Jim Barnett wrote:
> Rich,
> One use case for real-time access to media data is speech recognition.
> We would like to be able to use media obtained through getUserMedia
> to talk to an ASR system. It would be nice if we could just set up a
> PeerConnection to the ASR system, but ASR engines don't handle UDP
> very well (they can handle delays, but not lost packets.) So either
> we need to be able to set up a PeerConnection using TCP, or we need to
> give the app access to the audio in real time (and let it set up the
> TCP to the ASR engine.)
How is this not possible with the following existing pipeline:
MediaStream -> HTMLAudioElement -> Web Audio API [1] -> WebSockets -> ASR Service
?
By going through the Web Audio API [1] via an <audio> element to obtain ongoing AudioBuffer data from a MediaStream object and then sending that on to a 3rd-party ASR engine via e.g. a WebSocket connection you could achieve the same thing.
There may be other existing pipelines that could be used here too.
- Rich
[1]
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioBuffer-section
>
> - Jim
>
> -----Original Message-----
> From: Rich Tibbett [mailto:richt@opera.com]
> Sent: Thursday, August 23, 2012 10:56 AM
> To: Harald Alvestrand
> Cc: public-media-capture@w3.org
> Subject: Re: Describing recording by means of the Media Source
> interface
>
> Harald Alvestrand wrote:
>> I'm scanning the Media Source interface, and seeing how it describes
>> data formats for the buffers it uses.
>>
>> It seems to me logical to describe the recording interface in such a
>> way
>> that:
>>
>> If there exists a video stream v, a media source msrc and a media
>> stream ms, and (conceptually) msProducesData(buffer) is called every
>> time data is available at the recording interface, then the following
>> code:
>>
>> // Setup
>> v.src = window.URL.createObjectURL(msrc); buffer =
>> msrc.addSourceBuffer(mimetype) // So far unknown setup for the
>> recorder interface
>>
>> // playback
>> msProducesData(data) {
>> buffer.append(data)
>> }
>>
>>
>> should produce the same display (possibly somewhat delayed due to
>> buffering) as
>>
>> v.src = window.URL.createObjectURL(ms)
>>
>> The media source definition is available here:
>>
>> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-sour
>> c
>> e.html
>>
>>
>> It seems to me that if we can make sure this actually works, we'll
>> have achieved a little consistency across the media handling platform.
>>
>
> I've been trying to figure out exactly the purpose of having access to
> _real-time_ buffer data via any type of MediaStream Recording API. It
> is fairly clear that byte-level access to recorded data could be
> solved with existing interfaces, albeit not in real-time as the media
> is being recorded to a file but once a file has already been recorded
> in its entirety and returned to the web app.
>
> If we could simply start recording of a MediaStream with e.g.
> .start(), then stop it at some arbitrary point, thereby returning a
> File object [1] then we could then pass that object through the
> existing FileReader API [2] to chunk it and apply anything we wish to
> at the byte-level after the recording has been completed.
>
> MediaStream content that is currently being recorded via a
> MediaRecorder API could be simultaneously displayed to the user in the
> browser via a <video> or<audio> tag so ongoing playback of a stream
> being recorded seems to be a problem that is already solved in most respects.
>
> If we could simply return a File object once recording has been
> stopped then we've saved an exception amount of complexity from
> MediaStream recording (by not having to implement media buffers for
> ongoing recording data and not having to rely on the draft Stream API
> proposal which offers a lot of the functionality already available in
> FileReader
> - albeit in real-time).
>
> We wouldn't lose any of the ability to subsequently apply any
> modifications at the byte-level (via FileReader) - just that we
> wouldn't have real-time access to ongoing media recording data.
>
> I could live with this - unless there are some compelling use cases
> for reading ongoing MediaStream data in real-time as opposed to simply
> being able to read that data once it has already been collected in to
> a recording, in its entirety.
>
> Any use cases brought forward here requiring real-time access to
> ongoing recorded byte-data would be welcome. Otherwise, I'm in favor
> of greatly reducing the complexity involved with recording a MediaStream.
>
> [1] http://www.w3.org/TR/FileAPI/#dfn-file
>
> [2] http://www.w3.org/TR/FileAPI/#dfn-filereader
>
> --
> Rich Tibbett (richt)
> CORE Platform Architect - Opera Software ASA
>
Received on Thursday, 23 August 2012 16:53:33 UTC