RE: Describing recording by means of the Media Source interface

Rich,
  One use case for real-time access to media data is speech recognition.
We would like to be able to use media obtained  through getUserMedia to
talk to an ASR system.  It would be nice if we could just set up a
PeerConnection to the ASR system, but ASR engines don't handle UDP very
well (they can handle delays, but not lost packets.)  So either we need
to be able to set up a PeerConnection using TCP, or we need to give the
app access to the audio in real time (and let it set up the TCP to the
ASR engine.)

- Jim

-----Original Message-----
From: Rich Tibbett [mailto:richt@opera.com] 
Sent: Thursday, August 23, 2012 10:56 AM
To: Harald Alvestrand
Cc: public-media-capture@w3.org
Subject: Re: Describing recording by means of the Media Source interface

Harald Alvestrand wrote:
> I'm scanning the Media Source interface, and seeing how it describes 
> data formats for the buffers it uses.
>
> It seems to me logical to describe the recording interface in such a 
> way
> that:
>
> If there exists a video stream v, a media source msrc and a media 
> stream ms, and (conceptually) msProducesData(buffer) is called every 
> time data is available at the recording interface, then the following 
> code:
>
> // Setup
> v.src = window.URL.createObjectURL(msrc); buffer = 
> msrc.addSourceBuffer(mimetype) // So far unknown setup for the 
> recorder interface
>
> // playback
> msProducesData(data) {
> buffer.append(data)
> }
>
>
> should produce the same display (possibly somewhat delayed due to
> buffering) as
>
> v.src = window.URL.createObjectURL(ms)
>
> The media source definition is available here:
>
> http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-sourc
> e.html
>
>
> It seems to me that if we can make sure this actually works, we'll 
> have achieved a little consistency across the media handling platform.
>

I've been trying to figure out exactly the purpose of having access to
_real-time_ buffer data via any type of MediaStream Recording API. It is
fairly clear that byte-level access to recorded data could be solved
with existing interfaces, albeit not in real-time as the media is being
recorded to a file but once a file has already been recorded in its
entirety and returned to the web app.

If we could simply start recording of a MediaStream with e.g. .start(),
then stop it at some arbitrary point, thereby returning a File object
[1] then we could then pass that object through the existing FileReader
API [2] to chunk it and apply anything we wish to at the byte-level
after the recording has been completed.

MediaStream content that is currently being recorded via a MediaRecorder
API could be simultaneously displayed to the user in the browser via a
<video> or <audio> tag so ongoing playback of a stream being recorded
seems to be a problem that is already solved in most respects.

If we could simply return a File object once recording has been stopped
then we've saved an exception amount of complexity from MediaStream
recording (by not having to implement media buffers for ongoing
recording data and not having to rely on the draft Stream API proposal
which offers a lot of the functionality already available in FileReader
- albeit in real-time).

We wouldn't lose any of the ability to subsequently apply any
modifications at the byte-level (via FileReader) - just that we wouldn't
have real-time access to ongoing media recording data.

I could live with this - unless there are some compelling use cases for
reading ongoing MediaStream data in real-time as opposed to simply being
able to read that data once it has already been collected in to a
recording, in its entirety.

Any use cases brought forward here requiring real-time access to ongoing
recorded byte-data would be welcome. Otherwise, I'm in favor of greatly
reducing the complexity involved with recording a MediaStream.

[1] http://www.w3.org/TR/FileAPI/#dfn-file

[2] http://www.w3.org/TR/FileAPI/#dfn-filereader

--
Rich Tibbett (richt)
CORE Platform Architect - Opera Software ASA

Received on Thursday, 23 August 2012 15:01:26 UTC