Re: terminology (was: updates to requirements document) from Stefan Hakansson LK on 2012-07-16 (public-media-capture@w3.org from July 2012)

From: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>
Date: Mon, 16 Jul 2012 10:42:46 +0200
To: Jim Barnett <Jim.Barnett@genesyslab.com>
CC: Travis Leithead <travis.leithead@microsoft.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <5003D406.9040603@ericsson.com>
On 07/13/2012 09:40 PM, Jim Barnett wrote:
> I think that Milan has a different use case in mind than either Stefan
> or Travis is thinking of.  He wants to do speech recognition on the
> audio.  For that he needs to capture the audio in chunks in real time
> (can't wait till the user is done speaking).  He also wants to select a
> different, reliable, transport - not UDP. The app will be grabbing
> buffers of audio data and shipping them off to a remote recognition
> engine.  I'll let Milan explain in detail, but it is rather different
> from recording to a file.  (It's also very useful - Genesys and other
> contact center companies will be interested in using webRTC for this,
> since it lets us build  speech recognition into a web page.)
I think what Travis and I talked about can be used for this purpose. If 
the audio can be recorded continuously but spit out in chunks, each 
chunk could be sent to the speech recog server using a reliable 
transport (web socket or http).

Stefan

>
> - Jim
>
> -----Original Message-----
> From: Travis Leithead [mailto:travis.leithead@microsoft.com]
> Sent: Friday, July 13, 2012 2:59 PM
> To: Jim Barnett; Stefan Hakansson LK; public-media-capture@w3.org
> Subject: RE: terminology (was: updates to requirements document)
>
> Likewise, "record" and "capture" are synonyms to me too. In general, it
> seems like there are some other words we could use to be more precise,
> since we might be having misunderstandings based on terminology, which
> would be unfortunate.
>
> My understanding of the original proposal for recording (see
> http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-3) was that you
> could call a record() API to start _encoding_ the camera/mic's raw data
> into some binary format. Here I think the words "capture" and "record"
> both seem to refer to this process. At some point in the future you
> could call getRecordedData() (see
> http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-5) which would
> then asynchronously create a Blob object containing the encoded binary
> data in some known format (blob.type would indicate the mime type for
> the encoding whatever the UA decided to use -- there was no control or
> hint mechanism available via the API for encoded format selection). I
> believe the returned Blob was supposed to be a "complete" file, meaning
> that it's encoding contained a definitive start and end point, and was
> *not* a binary slice of some larger file. In other words, the returned
> Blob could be played directly in the html audio or video tag, or saved
> to a file system for storage, or sent over XHR to a server.
>
> So, when you mentioned the word "chunks" below, were you referring to
> the idea of calling getRecordedData() multiple times (assuming that each
> subsequent call reset the start-point of the next recording--which is
> actually *not* how that API was specified in fact)? Rather than "chunks"
> I think of these as completely separate "capture" sessions--they are
> complete captures from end-to-end.
>
> When I think of "chunks" I think of incomplete segments of the larger
> encoded in-progress capture. The point at which the larger encoded data
> buffer is sliced (to make a "chunk") might be arbitrary or not. I think
> that is something we can discuss. If it's arbitrary, than the JavaScript
> processing the raw encoded "chunks" must understand the format
> well-enough to know when there's not enough data available to correctly
> process a chunk, or where to stop. This is similar to how the HTML
> parser handles incoming bits from the wire before it determines what a
> page's encoding is. If we decide that the chunks must be sliced at more
> "appropriate" places, then the UA's must in turn implement this same
> logic given an understanding of the encoding in use. As an implementor,
> it seems like it would be much faster to just dump raw bits out of a
> slice arbitrarily (perhaps as quickly as possible after encoding) and
> let the JavaScript code deal with how to interpret them. In this case,
> the returned data should probably be in an TypedArray of some form.
>
>
>
>> -----Original Message-----
>> From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com]
>> Sent: Friday, July 13, 2012 6:16 AM
>> To: Stefan Hakansson LK; public-media-capture@w3.org
>> Subject: RE: updates to requirements document
>>
>> Stefan,
>>
>>    English is my native language and I don't  know the difference
>> between 'capture' and 'record' either.  The requirements doc used
>> 'capture' so I kept it, and introduced 'record' because that's the
> term I normally use.
>> If we can agree on a single term to use, I'll gladly update the spec.
>>
>>
>> - Jim
>>
>> -----Original Message-----
>> From: Stefan Hakansson LK [mailto:stefan.lk.hakansson@ericsson.com]
>> Sent: Friday, July 13, 2012 9:06 AM
>> To: public-media-capture@w3.org
>> Subject: Re: updates to requirements document
>>
>> Milan,
>>
>> isn't your core proposal that we should have a requirement that allows
>
>> recording of audio (and it would apply to video as well I guess) to a
> files, i.e.
>> some kind of continuous chunked recording?
>>
>> I think that would make sense (and that was how the original,
>> underspecified, recording function worked IIRC), and that those chunks
>
>> would be possible to use as source in the MediaSource API proposal
>> (even if my number one priority would be that those files would be
>> possible to use as a source to the audio/video elements).
>>
>> I do not understand why we would add words about "encoded" and so on
>> though. We don't use that kind of language in any other req, why here?
>>
>> Stefan
>>
>> PS English is not my native language, I would be very glad if someone
>> could explain the difference between "capture" and "record" for me - I
>
>> must admit I do not know the difference. Ideally I would like one word
>
>> meaning something like "using a mike/cam to start producing data" and
>> another one for "storing that data to a file".
>>
>>
>> On 07/11/2012 06:04 PM, Young, Milan wrote:
>>> Sorry if I'm missing context, but is there counter proposal or are
>>> you
>> just warning us that this is a long haul?
>>>
>>> Thanks
>>>
>>> -----Original Message-----
>>> From: Timothy B. Terriberry [mailto:tterriberry@mozilla.com]
>>> Sent: Wednesday, July 11, 2012 8:50 AM
>>> To: public-media-capture@w3.org
>>> Subject: Re: updates to requirements document
>>>
>>> Randell Jesup wrote:
>>>> And...  Defining the associated control information needed for
>>>> decoding is a significant task, especially as it would need to be
>>>> codec-agnostic.  (Which from the conversation I think you realize.)
>
>>>> This also is an API that I believe we at Mozilla (or some of us)
>>>> disagree with (though I'm not the person primarily following this;
>>>> I think Robert O'Callahan and Tim Terriberry are).
>>>
>>> More than just codec-agnostic. It would have to be a) flexible
>>> enough to support all the formats people care about (already
>>> challenging by
>>> itself) while b) well-defined enough to be re-implementable by every
>> vendor in a compatible way. This leaves you quite a fine needle to
> thread.
>>>
>>> I don't want people to under-estimate how much work is involved
> here.
>>>
>>>
>>
>>
>>
>>
>
>
Received on Monday, 16 July 2012 08:43:13 UTC