RE: terminology (was: updates to requirements document) from Jim Barnett on 2012-07-13 (public-media-capture@w3.org from July 2012)

From: Jim Barnett <Jim.Barnett@genesyslab.com>
Date: Fri, 13 Jul 2012 12:40:25 -0700
To: "Travis Leithead" <travis.leithead@microsoft.com>, "Stefan Hakansson LK" <stefan.lk.hakansson@ericsson.com>, <public-media-capture@w3.org>
Message-ID: <E17CAD772E76C742B645BD4DC602CD810671B5E0@NAHALD.us.int.genesyslab.com>
I think that Milan has a different use case in mind than either Stefan
or Travis is thinking of.  He wants to do speech recognition on the
audio.  For that he needs to capture the audio in chunks in real time
(can't wait till the user is done speaking).  He also wants to select a
different, reliable, transport - not UDP. The app will be grabbing
buffers of audio data and shipping them off to a remote recognition
engine.  I'll let Milan explain in detail, but it is rather different
from recording to a file.  (It's also very useful - Genesys and other
contact center companies will be interested in using webRTC for this,
since it lets us build  speech recognition into a web page.)

- Jim

-----Original Message-----
From: Travis Leithead [mailto:travis.leithead@microsoft.com] 
Sent: Friday, July 13, 2012 2:59 PM
To: Jim Barnett; Stefan Hakansson LK; public-media-capture@w3.org
Subject: RE: terminology (was: updates to requirements document)

Likewise, "record" and "capture" are synonyms to me too. In general, it
seems like there are some other words we could use to be more precise,
since we might be having misunderstandings based on terminology, which
would be unfortunate.

My understanding of the original proposal for recording (see
http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-3) was that you
could call a record() API to start _encoding_ the camera/mic's raw data
into some binary format. Here I think the words "capture" and "record"
both seem to refer to this process. At some point in the future you
could call getRecordedData() (see
http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-5) which would
then asynchronously create a Blob object containing the encoded binary
data in some known format (blob.type would indicate the mime type for
the encoding whatever the UA decided to use -- there was no control or
hint mechanism available via the API for encoded format selection). I
believe the returned Blob was supposed to be a "complete" file, meaning
that it's encoding contained a definitive start and end point, and was
*not* a binary slice of some larger file. In other words, the returned
Blob could be played directly in the html audio or video tag, or saved
to a file system for storage, or sent over XHR to a server.

So, when you mentioned the word "chunks" below, were you referring to
the idea of calling getRecordedData() multiple times (assuming that each
subsequent call reset the start-point of the next recording--which is
actually *not* how that API was specified in fact)? Rather than "chunks"
I think of these as completely separate "capture" sessions--they are
complete captures from end-to-end.

When I think of "chunks" I think of incomplete segments of the larger
encoded in-progress capture. The point at which the larger encoded data
buffer is sliced (to make a "chunk") might be arbitrary or not. I think
that is something we can discuss. If it's arbitrary, than the JavaScript
processing the raw encoded "chunks" must understand the format
well-enough to know when there's not enough data available to correctly
process a chunk, or where to stop. This is similar to how the HTML
parser handles incoming bits from the wire before it determines what a
page's encoding is. If we decide that the chunks must be sliced at more
"appropriate" places, then the UA's must in turn implement this same
logic given an understanding of the encoding in use. As an implementor,
it seems like it would be much faster to just dump raw bits out of a
slice arbitrarily (perhaps as quickly as possible after encoding) and
let the JavaScript code deal with how to interpret them. In this case,
the returned data should probably be in an TypedArray of some form.



> -----Original Message-----
> From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com]
> Sent: Friday, July 13, 2012 6:16 AM
> To: Stefan Hakansson LK; public-media-capture@w3.org
> Subject: RE: updates to requirements document
> 
> Stefan,
> 
>   English is my native language and I don't  know the difference 
> between 'capture' and 'record' either.  The requirements doc used 
> 'capture' so I kept it, and introduced 'record' because that's the
term I normally use.
> If we can agree on a single term to use, I'll gladly update the spec.
> 
> 
> - Jim
> 
> -----Original Message-----
> From: Stefan Hakansson LK [mailto:stefan.lk.hakansson@ericsson.com]
> Sent: Friday, July 13, 2012 9:06 AM
> To: public-media-capture@w3.org
> Subject: Re: updates to requirements document
> 
> Milan,
> 
> isn't your core proposal that we should have a requirement that allows

> recording of audio (and it would apply to video as well I guess) to a
files, i.e.
> some kind of continuous chunked recording?
> 
> I think that would make sense (and that was how the original, 
> underspecified, recording function worked IIRC), and that those chunks

> would be possible to use as source in the MediaSource API proposal 
> (even if my number one priority would be that those files would be 
> possible to use as a source to the audio/video elements).
> 
> I do not understand why we would add words about "encoded" and so on 
> though. We don't use that kind of language in any other req, why here?
> 
> Stefan
> 
> PS English is not my native language, I would be very glad if someone 
> could explain the difference between "capture" and "record" for me - I

> must admit I do not know the difference. Ideally I would like one word

> meaning something like "using a mike/cam to start producing data" and 
> another one for "storing that data to a file".
> 
> 
> On 07/11/2012 06:04 PM, Young, Milan wrote:
> > Sorry if I'm missing context, but is there counter proposal or are 
> > you
> just warning us that this is a long haul?
> >
> > Thanks
> >
> > -----Original Message-----
> > From: Timothy B. Terriberry [mailto:tterriberry@mozilla.com]
> > Sent: Wednesday, July 11, 2012 8:50 AM
> > To: public-media-capture@w3.org
> > Subject: Re: updates to requirements document
> >
> > Randell Jesup wrote:
> >> And...  Defining the associated control information needed for 
> >> decoding is a significant task, especially as it would need to be 
> >> codec-agnostic.  (Which from the conversation I think you realize.)

> >> This also is an API that I believe we at Mozilla (or some of us) 
> >> disagree with (though I'm not the person primarily following this; 
> >> I think Robert O'Callahan and Tim Terriberry are).
> >
> > More than just codec-agnostic. It would have to be a) flexible 
> > enough to support all the formats people care about (already 
> > challenging by
> > itself) while b) well-defined enough to be re-implementable by every
> vendor in a compatible way. This leaves you quite a fine needle to
thread.
> >
> > I don't want people to under-estimate how much work is involved
here.
> >
> >
> 
> 
> 
>
Received on Friday, 13 July 2012 19:41:11 UTC