Re: TAG feedback on Web Audio from Srikumar Karaikudi Subramanian on 2013-08-07 (public-audio@w3.org from July to September 2013)

From: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
Date: Wed, 7 Aug 2013 06:15:59 +0530
To: Chris Wilson <cwilso@google.com>
Cc: Marcus Geelnard <mage@opera.com>, Alex Russell <slightlyoff@google.com>, Noah Mendelsohn <nrm@arcanedomain.com>, Anne van Kesteren <annevk@annevk.nl>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>, "robert@ocallahan.org" <robert@ocallahan.org>, "public-audio@w3.org" <public-audio@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-Id: <B8E5D6AD-3518-445A-BB26-5D18193786A8@gmail.com>
On 6 Aug, 2013, at 10:28 PM, Chris Wilson <cwilso@google.com> wrote:

> I still (obviously) disagree that a model that relies on passing copies around will have the same memory/speed/latency performance.  It's possible the actual copying can be minimized, at a cost in usability to the developer, but even with that I don't think the cost will ever be truly zero; so this is a tradeoff.  It may be that the group chooses that tradeoff - but traditional audio APIs have not.  Of course, most of them don't have the limited execution scope of JS, either.  (Note that I should have said "glitch-free, reasonably-low-latency audio with good performance and API usability." :)

It is easy to be passionate about "performance at any cost" or "no data races at any cost", but it is useful to look at some known cases.

SuperCollider (SC) is an example of a synthesis system with a client-server architecture with the client only passing messages to the server via a network to get audio work done. The "local" server runs in its own process and the "internal" server is "more tightly coupled with the language" and runs in the same process as the client. Despite this proximity of the internal server, SC's creator writes "There is generally no benefit in using the internal server." in the "Local vs internal" section at http://doc.sccode.org/Classes/Server.html . Given the following that SC has in the latency-sensitive computer music community, and given that SC has been operating this way since the time desktop computers had the same power that mobile devices today have, this appears to be evidence against the school of thought that declares the performance overhead of such separation to be unacceptable.

As for passing data copies between independent processes, I've had some personal experience with decoding video (realtime) in one process and passing the frames to another process by copy just to give the decoders enough working memory on a 32-bit OS (Win XP). On such systems, this worked just as well as realtime decoding in the same process, except now the 2GB-per-process barrier won't blow up. In this case, each decoder was transferring about 35MB of data every second and usually the system was running two such decoder processes at a time. If you take a 5.1 duplex audio stream at 48KHz, the data rate of one such stream is only 2.2MB/s (using 32-bit samples), which pales in comparison. This suggests that having such an "audio server" send over a stream to the JS process, have the JS process modify it in a script processor node and send the results back by copy, is perhaps not that big a performance hit as it is imagined to be? I'm not sure, but I think running a realtime WebRTC video chat rendered in WebGL might already be doing something like this?

Despite the above, I believe Chris Rogers did consider such an architecture during the early stages of the API. It would be great if he can weigh in with the evidence he used to decide against this, and in favour of sharing memory between JS and the audio process.

The advantage to thinking in terms of such client-server communication is that the protocol can be made explicit for the spec. Note that I'm not suggesting that the spec force such a client-server implementation, but that the design process involve thinking as though the implementation were done that way. 

Best,
-Kumar

> 
> Interestingly enough, if we'd not allowed JS in nodes (rather used a more restricted audio processing language), I'm not sure we'd have this problem to this scale.
> 
> 
> On Tue, Aug 6, 2013 at 2:43 AM, Marcus Geelnard <mage@opera.com> wrote:
> 2013-08-06 04:20, Chris Wilson skrev:
>> See, I read two different things from this.  Marcus, I heard you say the comment about not being describable in JS was in specific reference to threading and memory sharing not being part of the current model; from Alex, I heard "open your mind to what JS *could* be, and then describe everything in one model" - in part, the nodes' behaviors themselves should be described in JS, albeit of course not necessarily run in JS.
>> 
>> The latter (as I'd previously expressed to Alex) is fine with me; I'd be fine describing how delay nodes work in JS.  However, high-quality glitch-free audio requires a number of things that aren't available to JS today - notably, a separate thread with restricted execution capabilities (i.e. not arbitrary JavaScript) and no garbage collection that can have elevated priority, and access to the physical hardware, of course.  It's not at all impossible to imagine how to describe Web Audio if you have those capabilities; and the memory sharing we're discussing under the rubric of race conditions is really whether that somewhat different execution environment supports those or not.
> 
> I have always been a strong supporter of being able to describe and implement as much as possible of the API in JS, and I agree that there are certainly some important aspects missing in the JS execution environment that make a practical implementation in JS impossible (including access to audio HW and low-level control over thread execution etc). You should, however, still be able to implement the behavior of the API in JS (as an exercise - imagine implementing an OfflineAudioContext in JS - I don't see why it shouldn't be possible).
> 
> I might have misinterpreted Alex's point (I thought he was referring to the shared data in the API interfaces, but of course there may be more things to it). However, from the perspective of making the Web Audio API describable in JS terms, I still think that the issue of exposing shared data interfaces to JS is a key deficiency in the current design, mostly because:
> 
> * It makes it impossible to express operations such as AudioBuffer.getChannelData() in JS terms.
> * Since the interfaces are exposed to JS, the only way to make them JS-friendly would be to change/extend the JS memory model (which is a much bigger task than to change the Web Audio API model).
> * It's not in any way necessary for achieving the goal "glitch-free, reasonably-low-latency audio".
> 
> In fact, I believe it should be fully possible to implement the Web Audio API without using shared mutable data internally (provided that we drop the shared data model from the interfaces). Correct me if I'm wrong, but I'm pretty sure you could solve most things by passing read-only references to data buffers between threads (which would be semantically equivalent to passing copies around) and still have the same memory/speed/latency performance. That would make it a moot point whether or not shared data must be part of a potential fictional execution environment or not.
> 
> /Marcus
> 
> 
> 
>> 
>> 
>> On Mon, Aug 5, 2013 at 1:49 PM, Alex Russell <slightlyoff@google.com> wrote:
>> On Sun, Aug 4, 2013 at 6:46 PM, Chris Wilson <cwilso@google.com> wrote:
>> (Sorry, on vacation, and beach > Web Audio discussions. :)
>> 
>> Alex, as we discussed a few weeks ago - glitch-free, reasonably-low-latency audio is something that I just don't believe JS - as it is today - can do.
>> 
>> In the TAG feedback document you might have detected two lines of attack on this argument:
>> 
>> It's insufficient to say "JavaScript can't do this". The entire goal of some of the suggestions in the document related to the execution model are all about creating space in the model to change the constraints under which JS (or something else) runs such that JS (or ASMJS or NaCL in a worker, etc.) could do what's needed without breaking the invariants of either the platform or the conceptual semantic model that Web Audio presents. So that's simply an argument against an assertion that hasn't been presented. It fails on that ground alone, however...
>> Nothing in my argument against breaking the platform invariants requires that you actually RUN the built-in node types in JS. To some great degree, I'm advocating for JS-as-spec-language when I discuss de-sugaring. Not an implementation strategy (except perhaps in naive cases for newly-bootstrapping impls).
>> The net is that discussing main-thread-JS-as-we-know-it-today and not the framework for execution/specification is arguing against an implementation-level straw-man.
>> 
>> Regards
>> 
>> On Thu, Aug 1, 2013 at 5:42 PM, Alex Russell <slightlyoff@google.com> wrote:
>> Let me be clearer, then: the issue is that it introduces effects to JS that can't be described in terms of JS. It violates the run-to-completion model of the language without appealing to the turn-based concurrency model we use everywhere else in the platform. 
>> 
>> 
>> On Wed, Jul 31, 2013 at 10:43 AM, Chris Wilson <cwilso@google.com> wrote:
>> In addition, I'd ask that you be more explicit than calling this problem "data races", because there's clearly some explicit effect you're trying to prevent.  Any asynchronously-programmable or event-driven system can enable developers to introduce race conditions.
>> 
>> 
>> On Mon, Jul 29, 2013 at 10:09 PM, Noah Mendelsohn <nrm@arcanedomain.com> wrote:
>> 
>> 
>> On 7/29/2013 7:05 PM, Anne van Kesteren wrote:
>> On Sat, Jul 27, 2013 at 9:22 AM, Noah Mendelsohn<nrm@arcanedomain.com>  wrote:
>> >Again, I have no informed opinions on the specific merits, just suggesting a
>> >useful role the TAG might play to clarify for the many members of the
>> >community who are less expert on this than you are. Thank you.
>> 
>> I'm not sure we call out data races anywhere, it's something we just don't do.
>> 
>> Well, my recollection may be faulty, but I think that one of the reasons the TAG took the trouble to formalize things like the architecture document was the belief that it's easier to ask skeptics to stick to rules that have been written down, and especially those that have garnered formal consensus through something like the Recommendation track.
>> 
>> Whether it's worth taking a guideline on data races all the way to Rec I'm not sure, but it seems that it would be worth setting it down formally, perhaps in a TAG Finding/blog post/Recommendation or whatever will get the right level of discussion, consensus building, and eventually attention.
>> 
>> Certainly, of the many things that have come up recently relating to APIs, this one seems deeply architectural and very much within the TAG's remit.
>> 
>> Noah
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Marcus Geelnard
> Technical Lead, Mobile Infrastructure
> Opera Software
>
Received on Wednesday, 7 August 2013 00:46:32 UTC