Re: terminology (was: updates to requirements document)

On 07/25/2012 06:21 PM, Young, Milan wrote:
> The use case for translation is already established in the document.
> If we can't agree on a method for addressing that use case over
> email, then we should simply add a general requirement as a
> placeholder.  I suggest:
>
> "The UA must expose capabilities for transmitting audio suitable for
> live speech recognition."
>
> Objections?

More of a question right now: what happened to the API that was proposed 
for this purpose (if I understand correctly)? It is part of the final 
report of the Speech XG 
(http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/), 
and I think I heard rumors of a variant of it being proposed for 
implementation.

Br,
Stefan

>
>
> -----Original Message----- From: Jim Barnett
> [mailto:Jim.Barnett@genesyslab.com] Sent: Monday, July 23, 2012 7:15
> AM To: Young, Milan; Stefan Hakansson LK; Travis Leithead Cc:
> public-media-capture@w3.org Subject: RE: terminology (was: updates to
> requirements document)
>
> I think that the speech recognition use case is important, so I agree
> with Milan.  If we can't  agree on something via email, we should
> add this as a topic for the next F2F.
>
> - Jim By the way, are there any other comments on the requirements
> doc?  I'm ready to make more changes.
>
> -----Original Message----- From: Young, Milan
> [mailto:Milan.Young@nuance.com] Sent: Monday, July 23, 2012 10:12 AM
> To: Stefan Hakansson LK; Travis Leithead Cc: Jim Barnett;
> public-media-capture@w3.org Subject: RE: terminology (was: updates to
> requirements document)
>
> I'd like to keep this discussion active.  Are folks in agreement with
> what I've written below?  If not, is there a planned F2F where we
> could add this to an agenda?
>
> Thanks
>
>
> -----Original Message----- From: Young, Milan
> [mailto:Milan.Young@nuance.com] Sent: Monday, July 16, 2012 8:40 AM
> To: Stefan Hakansson LK; Travis Leithead Cc: Jim Barnett;
> public-media-capture@w3.org Subject: RE: terminology (was: updates to
> requirements document)
>
> Perhaps we're dealing with different use cases, but for the
> translation scenario, requiring the application layer to poll the UA
> for complete audio snippets is not optimal.  This would tend to
> produce both irregular intervals and add a significant overhead to
> the encoding.
>
> I suggest that it would be better to use something like the WebAudio
> API proposal.  Namely an interface where the UA pushes blobs to the
> JS layer on fixed interval.  The data in the blobs would be complete
> from an encoding perspective, but only directly playable when
> prepended with all previous blobs in the session.
>
> I also suggest that the application layer should be given the ability
> to select from available codecs at the start of the capture session.
> If this is too complicated then we should specify that the UA SHOULD
> prefer codecs optimized for voice since that would be the most common
> audio type originating from the desktop microphone.
>
> Thanks
>
>
> -----Original Message----- From: Stefan Hakansson LK
> [mailto:stefan.lk.hakansson@ericsson.com] Sent: Monday, July 16, 2012
> 12:24 AM To: Travis Leithead Cc: Jim Barnett;
> public-media-capture@w3.org Subject: Re: terminology (was: updates to
> requirements document)
>
> On 07/13/2012 08:58 PM, Travis Leithead wrote:
>> Likewise, "record" and "capture" are synonyms to me too. In
>> general, it seems like there are some other words we could use to
>> be more precise, since we might be having misunderstandings based
>> on terminology, which would be unfortunate.
>
> I would like that. I would like one word for "enabling the mike/cam
> to start producing samples". This would correspond to what
> "getUserMedia" does. And another for storing those samples to a
> file.
>
>>
>> My understanding of the original proposal for recording (see
>> http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-3) was that
>> you could call a record() API to start _encoding_ the camera/mic's
>> raw data into some binary format. Here I think the words "capture"
>> and "record" both seem to refer to this process. At some point in
>> the future you could call getRecordedData() (see
>> http://www.w3.org/TR/2011/WD-webrtc-20111027/#methods-5) which
>> would then asynchronously create a Blob object containing the
>> encoded binary
>
>> data in some known format (blob.type would indicate the mime type
>> for the encoding whatever the UA decided to use -- there was no
>> control or
>
>> hint mechanism available via the API for encoded format selection).
>> I believe the returned Blob was supposed to be a "complete" file,
>> meaning that it's encoding contained a definitive start and end
>> point,
>
>> and was *not* a binary slice of some larger file. In other words,
>> the returned Blob could be played directly in the html audio or
>> video tag,
>
>> or saved to a file system for storage, or sent over XHR to a
>> server.
>>
>> So, when you mentioned the word "chunks" below, were you referring
>> to the idea of calling getRecordedData() multiple times (assuming
>> that each subsequent call reset the start-point of the next
>> recording--which is actually *not* how that API was specified in
>> fact)? Rather than "chunks" I think of these as completely
>> separate "capture" sessions--they are complete captures from
>> end-to-end.
>
> I must admit I had not thought through in detail. I had in mind
> something that would allow you to continuously record, but spit out
> the result in smaller segments ("chunks"). I had not thought about
> how the application should act to get that done.
>
>>
>> When I think of "chunks" I think of incomplete segments of the
>> larger encoded in-progress capture. The point at which the larger
>> encoded data buffer is sliced (to make a "chunk") might be
>> arbitrary or not. I think that is something we can discuss. If it's
>> arbitrary, than the JavaScript processing the raw encoded "chunks"
>> must understand the format well-enough to know when there's not
>> enough data available to correctly process a chunk, or where to
>> stop. This is similar to how the HTML parser handles incoming bits
>> from the wire before it determines what a page's encoding is. If we
>> decide that the chunks must be sliced at more "appropriate" places,
>> then the UA's must in turn implement this same logic given an
>> understanding of the encoding in use. As an implementor, it seems
>> like it would be much faster to just dump raw bits out of a slice
>> arbitrarily (perhaps as quickly as possible after encoding) and let
>> the JavaScript code deal with how to interpret them. In this case,
>> the returned data should probably be in an TypedArray of some
>> form.
>>
>>
>>
>>> -----Original Message----- From: Jim Barnett
>>> [mailto:Jim.Barnett@genesyslab.com] Sent: Friday, July 13, 2012
>>> 6:16 AM To: Stefan Hakansson LK; public-media-capture@w3.org
>>> Subject: RE: updates to requirements document
>>>
>>> Stefan,
>>>
>>> English is my native language and I don't  know the difference
>>> between 'capture' and 'record' either.  The requirements doc
>>> used 'capture' so I kept it, and introduced 'record' because
>>> that's the term I normally use. If we can agree on a single term
>>> to use, I'll gladly update the spec.
>>>
>>>
>>> - Jim
>>>
>>> -----Original Message----- From: Stefan Hakansson LK
>>> [mailto:stefan.lk.hakansson@ericsson.com] Sent: Friday, July 13,
>>> 2012 9:06 AM To: public-media-capture@w3.org Subject: Re: updates
>>> to requirements document
>>>
>>> Milan,
>>>
>>> isn't your core proposal that we should have a requirement that
>>> allows recording of audio (and it would apply to video as well I
>>> guess) to a files, i.e. some kind of continuous chunked
>>> recording?
>>>
>>> I think that would make sense (and that was how the original,
>>> underspecified, recording function worked IIRC), and that those
>>> chunks would be possible to use as source in the MediaSource API
>>> proposal (even if my number one priority would be that those
>>> files would be possible to use as a source to the audio/video
>>> elements).
>>>
>>> I do not understand why we would add words about "encoded" and so
>>> on though. We don't use that kind of language in any other req,
>>> why here?
>>>
>>> Stefan
>>>
>>> PS English is not my native language, I would be very glad if
>>> someone
>
>>> could explain the difference between "capture" and "record" for
>>> me - I must admit I do not know the difference. Ideally I would
>>> like one word meaning something like "using a mike/cam to start
>>> producing data" and another one for "storing that data to a
>>> file".
>>>
>>>
>>> On 07/11/2012 06:04 PM, Young, Milan wrote:
>>>> Sorry if I'm missing context, but is there counter proposal or
>>>> are you
>>> just warning us that this is a long haul?
>>>>
>>>> Thanks
>>>>
>>>> -----Original Message----- From: Timothy B. Terriberry
>>>> [mailto:tterriberry@mozilla.com] Sent: Wednesday, July 11,
>>>> 2012 8:50 AM To: public-media-capture@w3.org Subject: Re:
>>>> updates to requirements document
>>>>
>>>> Randell Jesup wrote:
>>>>> And...  Defining the associated control information needed
>>>>> for decoding is a significant task, especially as it would
>>>>> need to be codec-agnostic.  (Which from the conversation I
>>>>> think you realize.) This also is an API that I believe we at
>>>>> Mozilla (or some
>
>>>>> of us) disagree with (though I'm not the person primarily
>>>>> following
>
>>>>> this; I think Robert O'Callahan and Tim Terriberry are).
>>>>
>>>> More than just codec-agnostic. It would have to be a) flexible
>>>> enough to support all the formats people care about (already
>>>> challenging by itself) while b) well-defined enough to be
>>>> re-implementable by every
>>> vendor in a compatible way. This leaves you quite a fine needle
>>> to thread.
>>>>
>>>> I don't want people to under-estimate how much work is
>>>> involved here.
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
>

Received on Wednesday, 25 July 2012 16:31:52 UTC