RE: Lossless modes (Re: approaches to recording) from Jim Barnett on 2012-10-15 (public-media-capture@w3.org from October 2012)

From: Jim Barnett <Jim.Barnett@genesyslab.com>
Date: Mon, 15 Oct 2012 06:24:12 -0700
To: "Harald Alvestrand" <harald@alvestrand.no>
Cc: <public-media-capture@w3.org>
Message-ID: <E17CAD772E76C742B645BD4DC602CD8106CE0522@NAHALD.us.int.genesyslab.com>

Harald,

Yes, if this complicates the stack, it won't be worth the effort.  We
will use an asynch API that delivers buffers of data (of configurable
size) as they are available.  That's somewhat different from the/a
common recording case, where you just want the whole Blob when it's done
(or maybe you want it written out to file without the JS code ever
seeing it.)  This asynch API is the same one we'd use for real-time
media processing  as well (for example, drawing a box around the
bouncing ball).  There seems to be some disagreement about whether this
API is part of the recording API or a separate one.  

 

-          Jim

 

From: Harald Alvestrand [mailto:harald@alvestrand.no] 
Sent: Monday, October 15, 2012 5:23 AM
To: Jim Barnett
Cc: public-media-capture@w3.org
Subject: Lossless modes (Re: approaches to recording)

 

On 10/12/2012 05:39 PM, Jim Barnett wrote:

	Harald,

	The lack of real-time delivery is not normally an issue for
speech recognition systems, because they run many times faster than real
time, and can catch up quickly once the data is available.  So if the
delays are short enough, the user will not perceive them.  And if the
delays are longer, well... then speech recognition will take a long
time.  People are used to stuff being slow on the internet, aren't they?


Changing the subject, because this is a very different subject:

The problem with "just" adding "lossless mode" to a MediaStream
attachment to a PeerConnection is that it requires replacing the whole
protocol stack underneath that transport - the idea of
somewhat-unreliable, but always-reasonably-fast, transmission is deeply
embedded into the RTP/UDP protocol suite.

I don't even want to propose that the IETF takes on defining a
corresponding protocol suite at this time. It's MUCH simpler (seen from
my side as running-back-and-forth-between-W3C-and-IETF) to define a
local recording format that doesn't lose any bits, but also has no
fast-delivery expectations.

What does the current contact center and speech industry do when faced
with SIP telephone systems?





 

-          Jim

 

From: Harald Alvestrand [mailto:harald@alvestrand.no] 
Sent: Friday, October 12, 2012 11:35 AM
To: public-media-capture@w3.org
Subject: Re: approaches to recording

 

On 10/11/2012 12:50 AM, Jim Barnett wrote:

	I just want to observe that lossless streaming is what we (= the
contact center and speech industry) want for  talking to a speech
recognition system.  It would be ideal if PeerConnection supported it.
Failing that, it would be nice if the Recorder supported it,  but in a
pinch we figure that we can use the track-level API to deliver buffers
of speech data and let the JS code set up the TCP/IP connection.  

	 

Of course lossless streaming (truly guaranteed delivery) implies
non-real-time streaming (or, more formally, having to deal with the
possibility that delivery will be delayed beyond real-time), given that
the Internet is a lossy medium.

To another thread: Yes, having the constructor for the recorder take a
MIME type parameter would imply that you set the codec to be used. I
think we all agree that the data coming out of a recording interface is
encoded.

           Harald

Received on Monday, 15 October 2012 13:25:54 UTC