RE: terminology (was: updates to requirements document) from Young, Milan on 2012-07-23 (public-media-capture@w3.org from July 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Mon, 23 Jul 2012 14:12:17 +0000
To: Stefan Hakansson LK <stefan.lk.hakansson@ericsson.com>, Travis Leithead <travis.leithead@microsoft.com>
CC: Jim Barnett <Jim.Barnett@genesyslab.com>, "public-media-capture@w3.org" <public-media-capture@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A482AD0@SOM-EXCH04.nuance.com>
I'd like to keep this discussion active.  Are folks in agreement with what I've written below?  If not, is there a planned F2F where we could add this to an agenda?

Thanks
 

-----Original Message-----
From: Young, Milan [mailto:Milan.Young@nuance.com] 
Sent: Monday, July 16, 2012 8:40 AM
To: Stefan Hakansson LK; Travis Leithead
Cc: Jim Barnett; public-media-capture@w3.org
Subject: RE: terminology (was: updates to requirements document)

Perhaps we're dealing with different use cases, but for the translation scenario, requiring the application layer to poll the UA for complete audio snippets is not optimal.  This would tend to produce both irregular intervals and add a significant overhead to the encoding.

I suggest that it would be better to use something like the WebAudio API proposal.  Namely an interface where the UA pushes blobs to the JS layer on fixed interval.  The data in the blobs would be complete from an encoding perspective, but only directly playable when prepended with all previous blobs in the session.

I also suggest that the application layer should be given the ability to select from available codecs at the start of the capture session.  If this is too complicated then we should specify that the UA SHOULD prefer codecs optimized for voice since that would be the most common audio type originating from the desktop microphone.

Thanks


-----Original Message-----
From: Stefan Hakansson LK [mailto:stefan.lk.hakansson@ericsson.com]
Sent: Monday, July 16, 2012 12:24 AM
To: Travis Leithead
Cc: Jim Barnett; public-media-capture@w3.org
Subject: Re: terminology (was: updates to requirements document)

On 07/13/2012 08:58 PM, Travis Leithead wrote:
> Likewise, "record" and "capture" are synonyms to me too. In general, 
> it seems like there are some other words we could use to be more 
> precise, since we might be having misunderstandings based on 
> terminology, which would be unfortunate.

I would like that. I would like one word for "enabling the mike/cam to start producing samples". This would correspond to what "getUserMedia" 
does. And another for storing those samples to a file.

>
> My understanding of the original proposal for recording (see
> http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-3) was that you 
> could call a record() API to start _encoding_ the camera/mic's raw 
> data into some binary format. Here I think the words "capture" and 
> "record" both seem to refer to this process. At some point in the 
> future you could call getRecordedData() (see
> http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-5) which would 
> then asynchronously create a Blob object containing the encoded binary 
> data in some known format (blob.type would indicate the mime type for 
> the encoding whatever the UA decided to use -- there was no control or 
> hint mechanism available via the API for encoded format selection). I 
> believe the returned Blob was supposed to be a "complete" file, 
> meaning that it's encoding contained a definitive start and end point, 
> and was *not* a binary slice of some larger file. In other words, the 
> returned Blob could be played directly in the html audio or video tag, 
> or saved to a file system for storage, or sent over XHR to a server.
>
> So, when you mentioned the word "chunks" below, were you referring to 
> the idea of calling getRecordedData() multiple times (assuming that 
> each subsequent call reset the start-point of the next 
> recording--which is actually *not* how that API was specified in 
> fact)? Rather than "chunks" I think of these as completely separate 
> "capture" sessions--they are complete captures from end-to-end.

I must admit I had not thought through in detail. I had in mind something that would allow you to continuously record, but spit out the result in smaller segments ("chunks"). I had not thought about how the application should act to get that done.

>
> When I think of "chunks" I think of incomplete segments of the larger 
> encoded in-progress capture. The point at which the larger encoded 
> data buffer is sliced (to make a "chunk") might be arbitrary or not.
> I think that is something we can discuss. If it's arbitrary, than the 
> JavaScript processing the raw encoded "chunks" must understand the 
> format well-enough to know when there's not enough data available to 
> correctly process a chunk, or where to stop. This is similar to how 
> the HTML parser handles incoming bits from the wire before it 
> determines what a page's encoding is. If we decide that the chunks 
> must be sliced at more "appropriate" places, then the UA's must in 
> turn implement this same logic given an understanding of the encoding 
> in use. As an implementor, it seems like it would be much faster to 
> just dump raw bits out of a slice arbitrarily (perhaps as quickly as 
> possible after encoding) and let the JavaScript code deal with how to 
> interpret them. In this case, the returned data should probably be in 
> an TypedArray of some form.
>
>
>
>> -----Original Message----- From: Jim Barnett 
>> [mailto:Jim.Barnett@genesyslab.com] Sent: Friday, July 13, 2012
>> 6:16 AM To: Stefan Hakansson LK; public-media-capture@w3.org
>> Subject: RE: updates to requirements document
>>
>> Stefan,
>>
>> English is my native language and I don't  know the difference 
>> between 'capture' and 'record' either.  The requirements doc used 
>> 'capture' so I kept it, and introduced 'record' because that's the 
>> term I normally use. If we can agree on a single term to use, I'll 
>> gladly update the spec.
>>
>>
>> - Jim
>>
>> -----Original Message----- From: Stefan Hakansson LK 
>> [mailto:stefan.lk.hakansson@ericsson.com] Sent: Friday, July 13,
>> 2012 9:06 AM To: public-media-capture@w3.org Subject: Re: updates to 
>> requirements document
>>
>> Milan,
>>
>> isn't your core proposal that we should have a requirement that 
>> allows recording of audio (and it would apply to video as well I
>> guess) to a files, i.e. some kind of continuous chunked recording?
>>
>> I think that would make sense (and that was how the original, 
>> underspecified, recording function worked IIRC), and that those 
>> chunks would be possible to use as source in the MediaSource API 
>> proposal (even if my number one priority would be that those files 
>> would be possible to use as a source to the audio/video elements).
>>
>> I do not understand why we would add words about "encoded" and so on 
>> though. We don't use that kind of language in any other req, why 
>> here?
>>
>> Stefan
>>
>> PS English is not my native language, I would be very glad if someone 
>> could explain the difference between "capture" and "record"
>> for me - I must admit I do not know the difference. Ideally I would 
>> like one word meaning something like "using a mike/cam to start 
>> producing data" and another one for "storing that data to a file".
>>
>>
>> On 07/11/2012 06:04 PM, Young, Milan wrote:
>>> Sorry if I'm missing context, but is there counter proposal or are 
>>> you
>> just warning us that this is a long haul?
>>>
>>> Thanks
>>>
>>> -----Original Message----- From: Timothy B. Terriberry 
>>> [mailto:tterriberry@mozilla.com] Sent: Wednesday, July 11, 2012
>>> 8:50 AM To: public-media-capture@w3.org Subject: Re: updates to 
>>> requirements document
>>>
>>> Randell Jesup wrote:
>>>> And...  Defining the associated control information needed for 
>>>> decoding is a significant task, especially as it would need to be 
>>>> codec-agnostic.  (Which from the conversation I think you
>>>> realize.) This also is an API that I believe we at Mozilla (or some 
>>>> of us) disagree with (though I'm not the person primarily following 
>>>> this; I think Robert O'Callahan and Tim Terriberry are).
>>>
>>> More than just codec-agnostic. It would have to be a) flexible 
>>> enough to support all the formats people care about (already 
>>> challenging by itself) while b) well-defined enough to be 
>>> re-implementable by every
>> vendor in a compatible way. This leaves you quite a fine needle to 
>> thread.
>>>
>>> I don't want people to under-estimate how much work is involved 
>>> here.
>>>
>>>
>>
>>
>>
>>
>
>
Received on Monday, 23 July 2012 14:12:51 UTC