Re: Describing recording by means of the Media Source interface from Rich Tibbett on 2012-08-23 (public-media-capture@w3.org from August 2012)

From: Rich Tibbett <richt@opera.com>
Date: Thu, 23 Aug 2012 17:45:43 +0200
To: Jim Barnett <Jim.Barnett@genesyslab.com>
CC: Harald Alvestrand <harald@alvestrand.no>, public-media-capture@w3.org
Message-ID: <50365027.60000@opera.com>
Jim Barnett wrote:
> Rich,
>    One use case for real-time access to media data is speech recognition.
> We would like to be able to use media obtained  through getUserMedia to
> talk to an ASR system.  It would be nice if we could just set up a
> PeerConnection to the ASR system, but ASR engines don't handle UDP very
> well (they can handle delays, but not lost packets.)  So either we need
> to be able to set up a PeerConnection using TCP, or we need to give the
> app access to the audio in real time (and let it set up the TCP to the
> ASR engine.)

How is this not possible with the following existing pipeline:

MediaStream -> HTMLAudioElement -> Web Audio API [1] -> WebSockets -> 
ASR Service

?

By going through the Web Audio API [1] via an <audio> element to obtain 
ongoing AudioBuffer data from a MediaStream object and then sending that 
on to a 3rd-party ASR engine via e.g. a WebSocket connection you could 
achieve the same thing.

There may be other existing pipelines that could be used here too.

- Rich

[1] 
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioBuffer-section

>
> - Jim
>
> -----Original Message-----
> From: Rich Tibbett [mailto:richt@opera.com]
> Sent: Thursday, August 23, 2012 10:56 AM
> To: Harald Alvestrand
> Cc: public-media-capture@w3.org
> Subject: Re: Describing recording by means of the Media Source interface
>
> Harald Alvestrand wrote:
>> I'm scanning the Media Source interface, and seeing how it describes
>> data formats for the buffers it uses.
>>
>> It seems to me logical to describe the recording interface in such a
>> way
>> that:
>>
>> If there exists a video stream v, a media source msrc and a media
>> stream ms, and (conceptually) msProducesData(buffer) is called every
>> time data is available at the recording interface, then the following
>> code:
>>
>> // Setup
>> v.src = window.URL.createObjectURL(msrc); buffer =
>> msrc.addSourceBuffer(mimetype) // So far unknown setup for the
>> recorder interface
>>
>> // playback
>> msProducesData(data) {
>> buffer.append(data)
>> }
>>
>>
>> should produce the same display (possibly somewhat delayed due to
>> buffering) as
>>
>> v.src = window.URL.createObjectURL(ms)
>>
>> The media source definition is available here:
>>
>> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-sourc
>> e.html
>>
>>
>> It seems to me that if we can make sure this actually works, we'll
>> have achieved a little consistency across the media handling platform.
>>
>
> I've been trying to figure out exactly the purpose of having access to
> _real-time_ buffer data via any type of MediaStream Recording API. It is
> fairly clear that byte-level access to recorded data could be solved
> with existing interfaces, albeit not in real-time as the media is being
> recorded to a file but once a file has already been recorded in its
> entirety and returned to the web app.
>
> If we could simply start recording of a MediaStream with e.g. .start(),
> then stop it at some arbitrary point, thereby returning a File object
> [1] then we could then pass that object through the existing FileReader
> API [2] to chunk it and apply anything we wish to at the byte-level
> after the recording has been completed.
>
> MediaStream content that is currently being recorded via a MediaRecorder
> API could be simultaneously displayed to the user in the browser via a
> <video>  or<audio>  tag so ongoing playback of a stream being recorded
> seems to be a problem that is already solved in most respects.
>
> If we could simply return a File object once recording has been stopped
> then we've saved an exception amount of complexity from MediaStream
> recording (by not having to implement media buffers for ongoing
> recording data and not having to rely on the draft Stream API proposal
> which offers a lot of the functionality already available in FileReader
> - albeit in real-time).
>
> We wouldn't lose any of the ability to subsequently apply any
> modifications at the byte-level (via FileReader) - just that we wouldn't
> have real-time access to ongoing media recording data.
>
> I could live with this - unless there are some compelling use cases for
> reading ongoing MediaStream data in real-time as opposed to simply being
> able to read that data once it has already been collected in to a
> recording, in its entirety.
>
> Any use cases brought forward here requiring real-time access to ongoing
> recorded byte-data would be welcome. Otherwise, I'm in favor of greatly
> reducing the complexity involved with recording a MediaStream.
>
> [1] http://www.w3.org/TR/FileAPI/#dfn-file
>
> [2] http://www.w3.org/TR/FileAPI/#dfn-filereader
>
> --
> Rich Tibbett (richt)
> CORE Platform Architect - Opera Software ASA
>
Received on Thursday, 23 August 2012 15:46:19 UTC