RE: Describing recording by means of the Media Source interface from Young, Milan on 2012-08-23 (public-media-capture@w3.org from August 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Thu, 23 Aug 2012 16:53:05 +0000
To: Rich Tibbett <richt@opera.com>, Jim Barnett <Jim.Barnett@genesyslab.com>
CC: Harald Alvestrand <harald@alvestrand.no>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A497BA6@SOM-EXCH04.nuance.com>
The Web Audio API only provides access to the raw samples, which presents a practical barrier to transmission.  Quality ASR requires encoding, which would be best performed in browser space.

As for other standards-based pipelines, I haven't found any.

Thanks

-----Original Message-----
From: Rich Tibbett [mailto:richt@opera.com] 
Sent: Thursday, August 23, 2012 8:46 AM
To: Jim Barnett
Cc: Harald Alvestrand; public-media-capture@w3.org
Subject: Re: Describing recording by means of the Media Source interface

Jim Barnett wrote:
> Rich,
>    One use case for real-time access to media data is speech recognition.
> We would like to be able to use media obtained  through getUserMedia 
> to talk to an ASR system.  It would be nice if we could just set up a 
> PeerConnection to the ASR system, but ASR engines don't handle UDP 
> very well (they can handle delays, but not lost packets.)  So either 
> we need to be able to set up a PeerConnection using TCP, or we need to 
> give the app access to the audio in real time (and let it set up the 
> TCP to the ASR engine.)

How is this not possible with the following existing pipeline:



MediaStream -> HTMLAudioElement -> Web Audio API [1] -> WebSockets -> ASR Service

?

By going through the Web Audio API [1] via an <audio> element to obtain ongoing AudioBuffer data from a MediaStream object and then sending that on to a 3rd-party ASR engine via e.g. a WebSocket connection you could achieve the same thing.

There may be other existing pipelines that could be used here too.

- Rich

[1]
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioBuffer-section

>
> - Jim
>
> -----Original Message-----
> From: Rich Tibbett [mailto:richt@opera.com]
> Sent: Thursday, August 23, 2012 10:56 AM
> To: Harald Alvestrand
> Cc: public-media-capture@w3.org
> Subject: Re: Describing recording by means of the Media Source 
> interface
>
> Harald Alvestrand wrote:
>> I'm scanning the Media Source interface, and seeing how it describes 
>> data formats for the buffers it uses.
>>
>> It seems to me logical to describe the recording interface in such a 
>> way
>> that:
>>
>> If there exists a video stream v, a media source msrc and a media 
>> stream ms, and (conceptually) msProducesData(buffer) is called every 
>> time data is available at the recording interface, then the following
>> code:
>>
>> // Setup
>> v.src = window.URL.createObjectURL(msrc); buffer =
>> msrc.addSourceBuffer(mimetype) // So far unknown setup for the 
>> recorder interface
>>
>> // playback
>> msProducesData(data) {
>> buffer.append(data)
>> }
>>
>>
>> should produce the same display (possibly somewhat delayed due to
>> buffering) as
>>
>> v.src = window.URL.createObjectURL(ms)
>>
>> The media source definition is available here:
>>
>> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-sour
>> c
>> e.html
>>
>>
>> It seems to me that if we can make sure this actually works, we'll 
>> have achieved a little consistency across the media handling platform.
>>
>
> I've been trying to figure out exactly the purpose of having access to 
> _real-time_ buffer data via any type of MediaStream Recording API. It 
> is fairly clear that byte-level access to recorded data could be 
> solved with existing interfaces, albeit not in real-time as the media 
> is being recorded to a file but once a file has already been recorded 
> in its entirety and returned to the web app.
>
> If we could simply start recording of a MediaStream with e.g. 
> .start(), then stop it at some arbitrary point, thereby returning a 
> File object [1] then we could then pass that object through the 
> existing FileReader API [2] to chunk it and apply anything we wish to 
> at the byte-level after the recording has been completed.
>
> MediaStream content that is currently being recorded via a 
> MediaRecorder API could be simultaneously displayed to the user in the 
> browser via a <video>  or<audio>  tag so ongoing playback of a stream 
> being recorded seems to be a problem that is already solved in most respects.
>
> If we could simply return a File object once recording has been 
> stopped then we've saved an exception amount of complexity from 
> MediaStream recording (by not having to implement media buffers for 
> ongoing recording data and not having to rely on the draft Stream API 
> proposal which offers a lot of the functionality already available in 
> FileReader
> - albeit in real-time).
>
> We wouldn't lose any of the ability to subsequently apply any 
> modifications at the byte-level (via FileReader) - just that we 
> wouldn't have real-time access to ongoing media recording data.
>
> I could live with this - unless there are some compelling use cases 
> for reading ongoing MediaStream data in real-time as opposed to simply 
> being able to read that data once it has already been collected in to 
> a recording, in its entirety.
>
> Any use cases brought forward here requiring real-time access to 
> ongoing recorded byte-data would be welcome. Otherwise, I'm in favor 
> of greatly reducing the complexity involved with recording a MediaStream.
>
> [1] http://www.w3.org/TR/FileAPI/#dfn-file
>
> [2] http://www.w3.org/TR/FileAPI/#dfn-filereader
>
> --
> Rich Tibbett (richt)
> CORE Platform Architect - Opera Software ASA
>
Received on Thursday, 23 August 2012 16:53:33 UTC