Re: Adding Web Audio API Spec to W3C Repository from Francois Daoust on 2011-06-17 (public-audio@w3.org from April to June 2011)

From: Francois Daoust <fd@w3.org>
Date: Fri, 17 Jun 2011 12:10:11 +0200
To: Doug Schepers <schepers@w3.org>
CC: robert@ocallahan.org, public-audio@w3.org, Philippe Le Hegaret <plh@w3.org>, "Michael(tm) Smith" <mike@w3.org>, Dan Burnett <dburnett@voxeo.com>, Dominique Hazael-Massieux <dom@w3.org>, Paul Bakaus <pbakaus@zynga.com>, Tobie Langel <tobie@fb.com>
Message-ID: <4DFB2803.8010904@w3.org>
Hi Doug,

Thanks for forwarding the thread. This discussion is particularly relevant for the Web Real-Time Communications WG since it is to work on "Media Stream Functions" and "Audio Stream Functions" unless done by some other group. I do not have technical input for the discussion at this point but I see a lot of overlap. Obviously, it would be more than good to see convergence on the same solution. I'll point the Web RTC group to this discussion.

The Web RTC group has just issued a call for API contributions [1] to gather candidate APIs for the different functionalities it needs to deliver. The Stream API defined in WHATWG is a possible candidate.

Initial use cases suggested for Web Real-Time Communications involve the need to capture and process audio streams in various ways, include automatic gain control, echo cancellation, mute functions and the need to mix different audio sources possibly with spatial effects. Similar functionality is needed for video streams.

Francois.

[1] http://lists.w3.org/Archives/Public/public-webrtc/2011Jun/0021.html


On 06/10/2011 06:27 PM, Doug Schepers wrote:
> Hi, ROC-
>
> (apologies in advance for the long, rambling email)
>
> Robert O'Callahan wrote (on 6/9/11 5:46 PM):
>> We (Mozilla) definitely plan to put forward a new spec that builds on
>> the HTML Streams proposal. I would like to make more progress on the
>> implementation before we do that, but if you think otherwise, we can go
>> forward.
>
> I'm torn. On the one hand, I want to "get it right", which means implementation experience. On the other, if we are to have a reasoned counterpoint to Chris Rogers' fairly mature spec and implementation, we need to have a starting point for technical comparisons and conversations.
>
> I think I would prefer to see some spec text, even knowing that it might change dramatically during implementation; that said, I acknowledge that that is extra work, though it may be a useful step to you as implementers to solidify your scope.
>
>
>> I believe the concerns I raised about synchronization and the
>> relationship with the Streams proposal that I raised in the earlier
>> thread are still valid, but that thread was a tennis match between me
>> and Chris and I'd like to hear from W3C people and other parties
>> (especially HTML and Streams people) how they feel about those concerns.
>
> I read the conversation [1] with interest, but didn't feel qualified to respond on more than a superficial level. I agree with you that having compatible integration of these various use cases is a strong goal, but I wasn't convinced that they needed to be merged per se; I was more convinced by the argument that there should be hooks between them, but that they should be developed as stand-alone APIs, both to better fit their own requirements and audiences, and to allow them to be developed and extended at their own paces (not least for speedy progress towards a first Recommendation that is widely implemented, so we can move on to v2 with more author experience).
>
> For broader review and discussion, I've CCed Francois Daoust (staff contact for the Real-Time Communications WG), Dom Hazael-Massieux (staff contact for the Device APIs WG), Mike Smith (staff contact for the HTML WG), and Philippe Le Hégaret (Interaction Domain Lead); I've also CCed Dan Burnett, co-chair of the Voice Browser WG (which does VoiceXML) and chair of the HTML Speech Incubator Group. I've also CCed Paul Bakaus of Zynga and Tobie Langel of Facebook, who have an interest in audio for HTML5 games and user interfaces.
>
> I'd like them to solicit their opinions, and to suggest that they cast about for people in their own groups or circles who could chime in on a more technical level about the audio and media streams, and the relationship between the RTC and audio manipulation use cases.
>
>
>> As a veteran of decade-long efforts to resolve conflicts between specs
>> that never should have happened in the first place (SVG, CSS and HTML,
>> I'm looking at you), I think it's worth taking time to make sure we
>> don't have another Conway's law failure.
>
> I am with you there. As you know, I have always advocated for closer integration of these technologies, from the failed effort in the Compound Documents WG to the more successful FX Task Force. I don't think that this is the problem here... there are plenty of people talking to one another, and genuine open minds, but there is also a reasonable technical case for a degree of separation.
>
>
>> The immediate demand for audio
>> API will have to be (and is being) satisfied by libraries that abstract
>> over browser differences, and that will remain true for quite some time
>> no matter what the WG does.
>
> I can read this two ways (I don't know which way you meant it... maybe some third way?):
>
> 1) "Script libraries will help build audience-appropriate abstraction layers that make whatever the Audio WG does better fit the needs of that particular audience"; or
>
> 2) "We are going to do our own audio API our way, and ignore what is being done by the Audio WG or the other implementers."
>
> I agree with the motivations behind #1, but am concerned about the sentiments behind #2. Having a single API that enjoys consensus by different implementers across platforms and devices is strong motivation for others implementers to get on board, and makes it easier for them to justify their investment of energy, because the benefits to developers and users are profound. Having divergent and competing APIs might seem like an evolutionary sound "survival of the fittest" approach, but I'm not convinced that it will produce a timely best-of-breed hybrid that could be achieved by simply bringing in the right stakeholders and learning for their experience... it also seems like a Conway's Law failure at an inter-organizational level.
>
>
> Regarding the script library emphasis, I know that Corban Brook's audionode.js and Grant Galitz's XAudioJS are good (if incomplete) emulation layers, but do we have benchmarks that show how well they scale to multiple audio instances (such as used in games)? I can't shake the feeling that native code will continue to outperform emulation via script, and I'd be more comfortable if we had some hard data.
>
>
> [1] http://lists.w3.org/Archives/Public/public-audio/2011AprJun/0004.html
>
> Regards-
> -Doug Schepers
> W3C Staff Contact, SVG, WebApps, Web Events, and Audio WGs
>
Received on Friday, 17 June 2011 10:10:42 UTC