- From: Mandyam, Giridhar <mandyam@quicinc.com>
- Date: Thu, 29 Nov 2012 20:00:52 +0000
- To: Jim Barnett <Jim.Barnett@genesyslab.com>, Harald Alvestrand <harald@alvestrand.no>, "public-media-capture@w3.org" <public-media-capture@w3.org>
I am sorry - I don't believe a recording API should be used to enable real-time processing. I certainly do not think it should be used for any audio stream processing for ASR. This is what WebAudio is for, and we should work with the Audio WG if their current specification is unsuitable for what you believe is required for speech recognition. But we have a call next week - maybe we can discuss this further during that time. -----Original Message----- From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com] Sent: Thursday, November 29, 2012 11:53 AM To: Mandyam, Giridhar; Harald Alvestrand; public-media-capture@w3.org Subject: RE: revised recording proposal I'm not implying anything about typed arrays - only that when we first decided to use Blobs for recording, their immutability was cited as an advantage. The first recording proposal, which included incremental access to buffers of data, was defined as a set of method on MediaStream (or MediaStreamTrack). The consensus was to move it to a separate class. I don't consider the current proposal to be a "work-around" - it's the best available solution to a problem that's well within this group's scope. Even if we had a TCP-based PeerConnection for ASR purposes, we would still need to provide incremental access to buffers of data for real-time media processing applications. - Jim -----Original Message----- From: Mandyam, Giridhar [mailto:mandyam@quicinc.com] Sent: Thursday, November 29, 2012 2:36 PM To: Jim Barnett; Harald Alvestrand; public-media-capture@w3.org Subject: RE: revised recording proposal > The reason for using Blobs is that they are immutable objects and can be stored in memory or written to disk, which gives the UA a fair amount of flexibility. I don't think you are implying that typed arrays are mutable - right? Did you mean to state that it is a requirement that the object that is being returned is immutable? >The issue of a TCP-based PeerConnection has been raised and was firmly rejected The argument was that too many other specs that we are relying on assume UDP and we cannot undertake to rewrite all of them. From the point of view of speech recognition, having a TCP-based PeerConnection would be simpler and easier than having to write code to shove buffers of data into a socket, but it does not seem to be practical. Access to the raw bytes could conceivably be achieved by an appropriate method defined on the MediaStream itself - why not go this route? In general, I don't think it should be the responsibility of this Task Force to come up with workarounds because other Working Groups (WebRTC in this case) have been unable to satisfy important streaming use cases. You are defining a recording API. It should be designed and verified with respect to recording. It should not have additional functionality added to meet an unrelated use case. -----Original Message----- From: Jim Barnett [mailto:Jim.Barnett@genesyslab.com] Sent: Thursday, November 29, 2012 8:24 AM To: Mandyam, Giridhar; Harald Alvestrand; public-media-capture@w3.org Subject: RE: revised recording proposal The reason for using Blobs is that they are immutable objects and can be stored in memory or written to disk, which gives the UA a fair amount of flexibility. In general, the Blob will be gc'd once all references to it are gone. That should be once the dataavailable event passes out of scope. However if there are arguments for ArrayBuffers, I'd be glad to hear them. There hasn't been much discussion of this on the list. (If it matters, in the speech recognition case, the buffers are likely to be about 200ms in size, though of course we can't guarantee that apps won't ask for other sizes.) The issue of a TCP-based PeerConnection has been raised and was firmly rejected The argument was that too many other specs that we are relying on assume UDP and we cannot undertake to rewrite all of them. From the point of view of speech recognition, having a TCP-based PeerConnection would be simpler and easier than having to write code to shove buffers of data into a socket, but it does not seem to be practical. - Jim -----Original Message----- From: Mandyam, Giridhar [mailto:mandyam@quicinc.com] Sent: Thursday, November 29, 2012 11:14 AM To: Harald Alvestrand; public-media-capture@w3.org Subject: RE: revised recording proposal > It's not unlikely that some implementations that return blob-at-a-time will be considerably faster than some implementations that return file-at-a-time (the variant where file-at-a-time writes/swaps to disk, while blob-at-a-time stays within memory the whole time). If RAM is not an issue, you may be right. But when you are dealing with a limited heap space and the blob is not GC'ed upon consumption then what? If we have to comply with a requirement to provide timesliced data, I still think we are not exploring all options. I've already posed the question to the spec author and the mailing list about why we are using blobs to return timesliced data versus using ArrayBuffers, and I have not received a response. My understanding is that existing GC's handle the two data types very differently (if I go into more details I may have to start discussing proprietary implementations). > There are some proposed applications (speech recognition, recording to remote media on a limited-memory device) where file-at-a-time would be thoroughly inapppropriate. As I have mentioned in another email, I don't believe those applications are currently outlined in http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html. My suggestion was to move support for these scenarios into a second version of the spec. Also, if the idea is to enable reliable streaming of captured data, I don't think overloading a recording API with additional functionality is the way to do it. A TCP-based PeerConnection media streaming solution may help in this regard, but that is not a topic for this mailing list. I will reiterate that I like the direction in which this specification is moving. However, I think there are enough fundamental issues with the spec as written that I feel it is not appropriate at this time to bring it to an FPWD stage. I would like to see at least one more version before going that route. -Giri -----Original Message----- From: Harald Alvestrand [mailto:harald@alvestrand.no] Sent: Thursday, November 29, 2012 3:26 AM To: public-media-capture@w3.org Subject: Re: revised recording proposal On 11/29/2012 08:53 AM, Stefan Hakansson LK wrote: > On 11/28/2012 08:28 PM, Mandyam, Giridhar wrote: >> There is a difference between returning a blob at timeSlice intervals >> until stopRecording is called, versus returning a File object only >> once. I don't want to get into a discussion on how different GC's >> are treating blobs today, but to say that there is no performance >> difference between the two is premature. > > That is an aspect I did not consider! I - naïvely - was only thinking > about the API layer. > > If I understand you correctly, the issue is more related to returning > several instances of (time-sliced) recorded data during an ongoing > recording vs. only one set of data after the recording has ended than > to the actual handling of Blob's vs. File's. It's not unlikely that some implementations that return blob-at-a-time will be considerably faster than some implementations that return file-at-a-time (the variant where file-at-a-time writes/swaps to disk, while blob-at-a-time stays within memory the whole time). I don't think we're likely to see consistency in which way the advantage goes. There are some proposed applications (speech recognition, recording to remote media on a limited-memory device) where file-at-a-time would be thoroughly inapppropriate.
Received on Thursday, 29 November 2012 20:01:21 UTC