Re: Adding Web Audio API Spec to W3C Repository from Joseph Berkovitz on 2011-06-17 (public-audio@w3.org from April to June 2011)

From: Joseph Berkovitz <joe@noteflight.com>
Date: Fri, 17 Jun 2011 08:10:09 -0400
To: public-audio@w3.org
Message-Id: <901CD8E3-D317-494A-8276-55E26F579077@noteflight.com>
Al, Doug,

Unfortunately I will be on vacation and unable to join the call on the 20th, so I would like to propose an item for the agenda (I am assuming that the RTC/Audio overlap will be discussed).

The agenda item is as follows:

- Does the use case of many overlapping, transient sources of audio (such as musical notes or sound effects) have an impact on the potential convergence of RTC nd Audio? Note that there could easily be > 50 sources playing at a time, with very short durations, and sources need to be very frequently attached and detached from the mix in a sample-accurate manner.  This seems like an outlier case for an RTC session, but it is a normal case for music or gaming applications.

I would also appreciate it if we could start to publish minutes of these calls, which I suspect would be generally very useful to this group and to the larger community interested in what is happening.

Thanks for your consideration.

... .  .    .       Joe

Joe Berkovitz
President
Noteflight LLC
84 Hamilton St, Cambridge, MA 02139
phone: +1 978 314 6271
www.noteflight.com


On Jun 17, 2011, at 6:10 AM, Francois Daoust wrote:

> Hi Doug,
> 
> Thanks for forwarding the thread. This discussion is particularly relevant for the Web Real-Time Communications WG since it is to work on "Media Stream Functions" and "Audio Stream Functions" unless done by some other group. I do not have technical input for the discussion at this point but I see a lot of overlap. Obviously, it would be more than good to see convergence on the same solution. I'll point the Web RTC group to this discussion.
> 
> The Web RTC group has just issued a call for API contributions [1] to gather candidate APIs for the different functionalities it needs to deliver. The Stream API defined in WHATWG is a possible candidate.
> 
> Initial use cases suggested for Web Real-Time Communications involve the need to capture and process audio streams in various ways, include automatic gain control, echo cancellation, mute functions and the need to mix different audio sources possibly with spatial effects. Similar functionality is needed for video streams.
> 
> Francois.
> 
> [1] http://lists.w3.org/Archives/Public/public-webrtc/2011Jun/0021.html
> 
> 
> On 06/10/2011 06:27 PM, Doug Schepers wrote:
>> Hi, ROC-
>> 
>> (apologies in advance for the long, rambling email)
>> 
>> Robert O'Callahan wrote (on 6/9/11 5:46 PM):
>>> We (Mozilla) definitely plan to put forward a new spec that builds on
>>> the HTML Streams proposal. I would like to make more progress on the
>>> implementation before we do that, but if you think otherwise, we can go
>>> forward.
>> 
>> I'm torn. On the one hand, I want to "get it right", which means implementation experience. On the other, if we are to have a reasoned counterpoint to Chris Rogers' fairly mature spec and implementation, we need to have a starting point for technical comparisons and conversations.
>> 
>> I think I would prefer to see some spec text, even knowing that it might change dramatically during implementation; that said, I acknowledge that that is extra work, though it may be a useful step to you as implementers to solidify your scope.
>> 
>> 
>>> I believe the concerns I raised about synchronization and the
>>> relationship with the Streams proposal that I raised in the earlier
>>> thread are still valid, but that thread was a tennis match between me
>>> and Chris and I'd like to hear from W3C people and other parties
>>> (especially HTML and Streams people) how they feel about those concerns.
>> 
>> I read the conversation [1] with interest, but didn't feel qualified to respond on more than a superficial level. I agree with you that having compatible integration of these various use cases is a strong goal, but I wasn't convinced that they needed to be merged per se; I was more convinced by the argument that there should be hooks between them, but that they should be developed as stand-alone APIs, both to better fit their own requirements and audiences, and to allow them to be developed and extended at their own paces (not least for speedy progress towards a first Recommendation that is widely implemented, so we can move on to v2 with more author experience).
>> 
>> For broader review and discussion, I've CCed Francois Daoust (staff contact for the Real-Time Communications WG), Dom Hazael-Massieux (staff contact for the Device APIs WG), Mike Smith (staff contact for the HTML WG), and Philippe Le Hégaret (Interaction Domain Lead); I've also CCed Dan Burnett, co-chair of the Voice Browser WG (which does VoiceXML) and chair of the HTML Speech Incubator Group. I've also CCed Paul Bakaus of Zynga and Tobie Langel of Facebook, who have an interest in audio for HTML5 games and user interfaces.
>> 
>> I'd like them to solicit their opinions, and to suggest that they cast about for people in their own groups or circles who could chime in on a more technical level about the audio and media streams, and the relationship between the RTC and audio manipulation use cases.
>> 
>> 
>>> As a veteran of decade-long efforts to resolve conflicts between specs
>>> that never should have happened in the first place (SVG, CSS and HTML,
>>> I'm looking at you), I think it's worth taking time to make sure we
>>> don't have another Conway's law failure.
>> 
>> I am with you there. As you know, I have always advocated for closer integration of these technologies, from the failed effort in the Compound Documents WG to the more successful FX Task Force. I don't think that this is the problem here... there are plenty of people talking to one another, and genuine open minds, but there is also a reasonable technical case for a degree of separation.
>> 
>> 
>>> The immediate demand for audio
>>> API will have to be (and is being) satisfied by libraries that abstract
>>> over browser differences, and that will remain true for quite some time
>>> no matter what the WG does.
>> 
>> I can read this two ways (I don't know which way you meant it... maybe some third way?):
>> 
>> 1) "Script libraries will help build audience-appropriate abstraction layers that make whatever the Audio WG does better fit the needs of that particular audience"; or
>> 
>> 2) "We are going to do our own audio API our way, and ignore what is being done by the Audio WG or the other implementers."
>> 
>> I agree with the motivations behind #1, but am concerned about the sentiments behind #2. Having a single API that enjoys consensus by different implementers across platforms and devices is strong motivation for others implementers to get on board, and makes it easier for them to justify their investment of energy, because the benefits to developers and users are profound. Having divergent and competing APIs might seem like an evolutionary sound "survival of the fittest" approach, but I'm not convinced that it will produce a timely best-of-breed hybrid that could be achieved by simply bringing in the right stakeholders and learning for their experience... it also seems like a Conway's Law failure at an inter-organizational level.
>> 
>> 
>> Regarding the script library emphasis, I know that Corban Brook's audionode.js and Grant Galitz's XAudioJS are good (if incomplete) emulation layers, but do we have benchmarks that show how well they scale to multiple audio instances (such as used in games)? I can't shake the feeling that native code will continue to outperform emulation via script, and I'd be more comfortable if we had some hard data.
>> 
>> 
>> [1] http://lists.w3.org/Archives/Public/public-audio/2011AprJun/0004.html
>> 
>> Regards-
>> -Doug Schepers
>> W3C Staff Contact, SVG, WebApps, Web Events, and Audio WGs
>> 
>
Received on Friday, 17 June 2011 12:10:49 UTC