Re: Describing recording by means of the Media Source interface from Rich Tibbett on 2012-08-23 (public-media-capture@w3.org from August 2012)

From: Rich Tibbett <richt@opera.com>
Date: Thu, 23 Aug 2012 19:07:27 +0200
To: "Young, Milan" <Milan.Young@nuance.com>
CC: Jim Barnett <Jim.Barnett@genesyslab.com>, Harald Alvestrand <harald@alvestrand.no>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <5036634F.9040201@opera.com>
Young, Milan wrote:
> The Web Audio API only provides access to the raw samples, which presents a practical barrier to transmission.  Quality ASR requires encoding, which would be best performed in browser space.

I agree encoding would be an important consideration for data 
transmission purposes.

The natural place to have that discussion would be in relation to the 
Web Audio API though.

..Or we can duplicate Web Audio API functionality in our own Media 
Recording API - albeit with encoding included - which strikes me as a 
duplication of effort across different W3C groups and a conflict of 
intended functionality for the Web Platform. It's always nice to do the 
work required only once.

>
> As for other standards-based pipelines, I haven't found any.

As Josh pointed out, there is also a pipeline that allows the same thing 
for real-time access to MediaStream video byte-level data:

http://lists.w3.org/Archives/Public/public-media-capture/2012Aug/0109.html

>
> Thanks
>
> -----Original Message-----
> From: Rich Tibbett [mailto:richt@opera.com]
> Sent: Thursday, August 23, 2012 8:46 AM
> To: Jim Barnett
> Cc: Harald Alvestrand; public-media-capture@w3.org
> Subject: Re: Describing recording by means of the Media Source interface
>
> Jim Barnett wrote:
>> Rich,
>>     One use case for real-time access to media data is speech recognition.
>> We would like to be able to use media obtained  through getUserMedia
>> to talk to an ASR system.  It would be nice if we could just set up a
>> PeerConnection to the ASR system, but ASR engines don't handle UDP
>> very well (they can handle delays, but not lost packets.)  So either
>> we need to be able to set up a PeerConnection using TCP, or we need to
>> give the app access to the audio in real time (and let it set up the
>> TCP to the ASR engine.)
>
> How is this not possible with the following existing pipeline:
>
>
>
> MediaStream ->  HTMLAudioElement ->  Web Audio API [1] ->  WebSockets ->  ASR Service
>
> ?
>
> By going through the Web Audio API [1] via an<audio>  element to obtain ongoing AudioBuffer data from a MediaStream object and then sending that on to a 3rd-party ASR engine via e.g. a WebSocket connection you could achieve the same thing.
>
> There may be other existing pipelines that could be used here too.
>
> - Rich
>
> [1]
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#AudioBuffer-section
>
>> - Jim
>>
>> -----Original Message-----
>> From: Rich Tibbett [mailto:richt@opera.com]
>> Sent: Thursday, August 23, 2012 10:56 AM
>> To: Harald Alvestrand
>> Cc: public-media-capture@w3.org
>> Subject: Re: Describing recording by means of the Media Source
>> interface
>>
>> Harald Alvestrand wrote:
>>> I'm scanning the Media Source interface, and seeing how it describes
>>> data formats for the buffers it uses.
>>>
>>> It seems to me logical to describe the recording interface in such a
>>> way
>>> that:
>>>
>>> If there exists a video stream v, a media source msrc and a media
>>> stream ms, and (conceptually) msProducesData(buffer) is called every
>>> time data is available at the recording interface, then the following
>>> code:
>>>
>>> // Setup
>>> v.src = window.URL.createObjectURL(msrc); buffer =
>>> msrc.addSourceBuffer(mimetype) // So far unknown setup for the
>>> recorder interface
>>>
>>> // playback
>>> msProducesData(data) {
>>> buffer.append(data)
>>> }
>>>
>>>
>>> should produce the same display (possibly somewhat delayed due to
>>> buffering) as
>>>
>>> v.src = window.URL.createObjectURL(ms)
>>>
>>> The media source definition is available here:
>>>
>>> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-sour
>>> c
>>> e.html
>>>
>>>
>>> It seems to me that if we can make sure this actually works, we'll
>>> have achieved a little consistency across the media handling platform.
>>>
>> I've been trying to figure out exactly the purpose of having access to
>> _real-time_ buffer data via any type of MediaStream Recording API. It
>> is fairly clear that byte-level access to recorded data could be
>> solved with existing interfaces, albeit not in real-time as the media
>> is being recorded to a file but once a file has already been recorded
>> in its entirety and returned to the web app.
>>
>> If we could simply start recording of a MediaStream with e.g.
>> .start(), then stop it at some arbitrary point, thereby returning a
>> File object [1] then we could then pass that object through the
>> existing FileReader API [2] to chunk it and apply anything we wish to
>> at the byte-level after the recording has been completed.
>>
>> MediaStream content that is currently being recorded via a
>> MediaRecorder API could be simultaneously displayed to the user in the
>> browser via a<video>   or<audio>   tag so ongoing playback of a stream
>> being recorded seems to be a problem that is already solved in most respects.
>>
>> If we could simply return a File object once recording has been
>> stopped then we've saved an exception amount of complexity from
>> MediaStream recording (by not having to implement media buffers for
>> ongoing recording data and not having to rely on the draft Stream API
>> proposal which offers a lot of the functionality already available in
>> FileReader
>> - albeit in real-time).
>>
>> We wouldn't lose any of the ability to subsequently apply any
>> modifications at the byte-level (via FileReader) - just that we
>> wouldn't have real-time access to ongoing media recording data.
>>
>> I could live with this - unless there are some compelling use cases
>> for reading ongoing MediaStream data in real-time as opposed to simply
>> being able to read that data once it has already been collected in to
>> a recording, in its entirety.
>>
>> Any use cases brought forward here requiring real-time access to
>> ongoing recorded byte-data would be welcome. Otherwise, I'm in favor
>> of greatly reducing the complexity involved with recording a MediaStream.
>>
>> [1] http://www.w3.org/TR/FileAPI/#dfn-file
>>
>> [2] http://www.w3.org/TR/FileAPI/#dfn-filereader
>>
>> --
>> Rich Tibbett (richt)
>> CORE Platform Architect - Opera Software ASA
>>
Received on Thursday, 23 August 2012 17:08:07 UTC