W3C home > Mailing lists > Public > public-audio@w3.org > July to September 2013

Re: TAG feedback on Web Audio

From: K. Gadd <kg@luminance.org>
Date: Tue, 6 Aug 2013 18:56:42 -0700
Message-ID: <CAPJwq3WQW4mX7UAHhxABvz5Kc5NNUY+LYY1C=_ut3Zo4sDwvEA@mail.gmail.com>
To: Srikumar Karaikudi Subramanian <srikumarks@gmail.com>
Cc: Chris Wilson <cwilso@google.com>, Marcus Geelnard <mage@opera.com>, Alex Russell <slightlyoff@google.com>, Noah Mendelsohn <nrm@arcanedomain.com>, Anne van Kesteren <annevk@annevk.nl>, Olivier Thereaux <Olivier.Thereaux@bbc.co.uk>, "robert@ocallahan.org" <robert@ocallahan.org>, "public-audio@w3.org" <public-audio@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
For reference, if I open a 720P flash player video (looks like 30hz?) in
Chrome right now, I can clearly observe (in procexp, etc) the renderer
process communicating with the plugin and GPU processes - usually in excess
of 3.5MB/s, excluding the actual network traffic to stream the video. Given
flash's continued prevalence for streaming realtime audio/video and use
cases like video chat and game rendering, whatever works for the pepper
flash plugin should probably work okay for web audio (though it certainly
isn't necessarily *needed*).

It's also possible in many scenarios to replace a copy with an ownership
transfer, though of course that isn't easy to introduce with the existing
API. The neutering model for typed arrays is effectively this, and allows
safe 'sharing' of a hunk of data between threads/processes without any of
the risks present in non-neutered cases, because the neutering of a buffer
when you transfer it to/from a worker effectively does the same thing as
acquiring a mutex that protects the buffer, without the need to explicitly
guard every access in the VM (obviously, that kind of guard would not be
acceptable because it would slow down all uses of the buffer.) It sounds
like difficulties with actually implementing neutering are responsible for
its use not happening here, and that's understandable - but it's important
to avoid the falsehood that 'passing copies around' is the only sort of
alternative being proposed here. It's not.

As before I repeat the need for conclusive benchmarks/test cases that
actually demonstrate any perilous performance penalty before using that
penalty in support of an argument. Yes, copies are slower, but given their
significant presence in the Chrome architecture up to this point in other
subsystems, they are clearly worth the cost in some scenarios, whether for
isolation, responsiveness, stability, or better performance in multicore
scenarios. If the problem with copies is their cost on some particular
hardware architecture, and that hardware is important - whether it be a
chromebook, an android phone, or a low-spec netbook running Windows - it's
important to establish that and clearly communicate the goals so that
everyone else on the list can measure their proposals against those

On Tue, Aug 6, 2013 at 5:45 PM, Srikumar Karaikudi Subramanian <
srikumarks@gmail.com> wrote:

> On 6 Aug, 2013, at 10:28 PM, Chris Wilson <cwilso@google.com> wrote:
> I still (obviously) disagree that a model that relies on passing copies
> around will have the same memory/speed/latency performance.  It's possible
> the actual copying can be minimized, at a cost in usability to the
> developer, but even with that I don't think the cost will ever be truly
> zero; so this is a tradeoff.  It may be that the group chooses that
> tradeoff - but traditional audio APIs have not.  Of course, most of them
> don't have the limited execution scope of JS, either.  (Note that I should
> have said "glitch-free, reasonably-low-latency audio with good performance
> and API usability." :)
> It is easy to be passionate about "performance at any cost" or "no data
> races at any cost", but it is useful to look at some known cases.
> SuperCollider (SC) is an example of a synthesis system with a
> client-server architecture with the client only passing messages to the
> server via a network to get audio work done. The "local" server runs in its
> own process and the "internal" server is "more tightly coupled with the
> language" and runs in the same process as the client. Despite this
> proximity of the internal server, SC's creator writes "There is generally
> no benefit in using the internal server." in the "Local vs internal"
> section at http://doc.sccode.org/Classes/Server.html . Given the
> following that SC has in the latency-sensitive computer music community,
> and given that SC has been operating this way since the time desktop
> computers had the same power that mobile devices today have, this appears
> to be evidence against the school of thought that declares the performance
> overhead of such separation to be unacceptable.
> As for passing data copies between independent processes, I've had some
> personal experience with decoding video (realtime) in one process and
> passing the frames to another process by copy just to give the decoders
> enough working memory on a 32-bit OS (Win XP). On such systems, this worked
> just as well as realtime decoding in the same process, except now the
> 2GB-per-process barrier won't blow up. In this case, each decoder was
> transferring about 35MB of data every second and usually the system was
> running two such decoder processes at a time. If you take a 5.1 duplex
> audio stream at 48KHz, the data rate of one such stream is only 2.2MB/s
> (using 32-bit samples), which pales in comparison. This suggests that
> having such an "audio server" send over a stream to the JS process, have
> the JS process modify it in a script processor node and send the results
> back by copy, is perhaps not that big a performance hit as it is imagined
> to be? I'm not sure, but I think running a realtime WebRTC video chat
> rendered in WebGL might already be doing something like this?
> Despite the above, I believe Chris Rogers did consider such an
> architecture during the early stages of the API. It would be great if he
> can weigh in with the evidence he used to decide against this, and in
> favour of sharing memory between JS and the audio process.
> The advantage to thinking in terms of such client-server communication is
> that the protocol can be made explicit for the spec. Note that I'm not
> suggesting that the spec force such a client-server implementation, but
> that the design process involve thinking as though the implementation were
> done that way.
> Best,
> -Kumar
> Interestingly enough, if we'd not allowed JS in nodes (rather used a more
> restricted audio processing language), I'm not sure we'd have this problem
> to this scale.
> On Tue, Aug 6, 2013 at 2:43 AM, Marcus Geelnard <mage@opera.com> wrote:
>>  2013-08-06 04:20, Chris Wilson skrev:
>> See, I read two different things from this.  Marcus, I heard you say the
>> comment about not being describable in JS was in specific reference to
>> threading and memory sharing not being part of the current model; from
>> Alex, I heard "open your mind to what JS *could* be, and then describe
>> everything in one model" - in part, the nodes' behaviors themselves should
>> be described in JS, albeit of course not necessarily run in JS.
>>  The latter (as I'd previously expressed to Alex) is fine with me; I'd
>> be fine describing how delay nodes work in JS.  However, high-quality
>> glitch-free audio requires a number of things that aren't available to JS
>> today - notably, a separate thread with restricted execution capabilities
>> (i.e. not arbitrary JavaScript) and no garbage collection that can have
>> elevated priority, and access to the physical hardware, of course.  It's
>> not at all impossible to imagine how to describe Web Audio if you have
>> those capabilities; and the memory sharing we're discussing under the
>> rubric of race conditions is really whether that somewhat different
>> execution environment supports those or not.
>> I have always been a strong supporter of being able to describe and
>> implement as much as possible of the API in JS, and I agree that there are
>> certainly some important aspects missing in the JS execution environment
>> that make a practical implementation in JS impossible (including access to
>> audio HW and low-level control over thread execution etc). You should,
>> however, still be able to implement the *behavior* of the API in JS (as
>> an exercise - imagine implementing an OfflineAudioContext in JS - I don't
>> see why it shouldn't be possible).
>> I might have misinterpreted Alex's point (I thought he was referring to
>> the shared data in the API interfaces, but of course there may be more
>> things to it). However, from the perspective of making the Web Audio API
>> describable in JS terms, I still think that the issue of exposing shared
>> data interfaces to JS is a key deficiency in the current design, mostly
>> because:
>> * It makes it impossible to express operations such as
>> AudioBuffer.getChannelData() in JS terms.
>> * Since the interfaces are exposed to JS, the *only* way to make them
>> JS-friendly would be to change/extend the JS memory model (which is a much
>> bigger task than to change the Web Audio API model).
>> * It's not in any way necessary for achieving the goal "glitch-free,
>> reasonably-low-latency audio".
>> In fact, I believe it should be fully possible to implement the Web Audio
>> API without using shared *mutable* data internally (provided that we
>> drop the shared data model from the interfaces). Correct me if I'm wrong,
>> but I'm pretty sure you could solve most things by passing read-only
>> references to data buffers between threads (which would be semantically
>> equivalent to passing copies around) and still have the same
>> memory/speed/latency performance. That would make it a moot point whether
>> or not shared data must be part of a potential fictional execution
>> environment or not.
>> /Marcus
>> On Mon, Aug 5, 2013 at 1:49 PM, Alex Russell <slightlyoff@google.com>wrote:
>>>  On Sun, Aug 4, 2013 at 6:46 PM, Chris Wilson <cwilso@google.com> wrote:
>>>> (Sorry, on vacation, and beach > Web Audio discussions. :)
>>>>  Alex, as we discussed a few weeks ago - glitch-free,
>>>> reasonably-low-latency audio is something that I just don't believe JS -
>>>> *as it is today - *can do.
>>>  In the TAG feedback document you might have detected two lines of
>>> attack on this argument:
>>>    1. It's insufficient to say "JavaScript can't do this". The entire
>>>    goal of some of the suggestions in the document related to the execution
>>>    model are all about creating space in the model to change the constraints
>>>    under which JS (or something else) runs such that JS* *(or ASMJS or
>>>    NaCL in a worker, etc.) *could* do what's needed without breaking
>>>    the invariants of either the platform or the conceptual semantic model that
>>>    Web Audio presents. So that's simply an argument against an assertion
>>>    *that hasn't been presented*. It fails on that ground alone,
>>>    however...
>>>    2. Nothing in my argument against breaking the platform invariants
>>>    requires that you actually RUN the built-in node types in JS. To some great
>>>    degree, I'm advocating for JS-as-spec-language when I discuss de-sugaring.
>>>    Not an implementation strategy (except perhaps in naive cases for
>>>    newly-bootstrapping impls).
>>> The net is that discussing main-thread-JS-as-we-know-it-today and not
>>> the framework for execution/specification is arguing against an
>>> implementation-level straw-man.
>>>  Regards
>>>    On Thu, Aug 1, 2013 at 5:42 PM, Alex Russell <slightlyoff@google.com>wrote:
>>>>> Let me be clearer, then: the issue is that it introduces effects to JS
>>>>> that can't be described *in terms of JS*. It violates the
>>>>> run-to-completion model of the language without appealing to the turn-based
>>>>> concurrency model we use everywhere else in the platform.
>>>>> On Wed, Jul 31, 2013 at 10:43 AM, Chris Wilson <cwilso@google.com>wrote:
>>>>>> In addition, I'd ask that you be more explicit than calling this
>>>>>> problem "data races", because there's clearly some explicit effect you're
>>>>>> trying to prevent.  Any asynchronously-programmable or event-driven system
>>>>>> can enable developers to introduce race conditions.
>>>>>> On Mon, Jul 29, 2013 at 10:09 PM, Noah Mendelsohn <
>>>>>> nrm@arcanedomain.com> wrote:
>>>>>>> On 7/29/2013 7:05 PM, Anne van Kesteren wrote:
>>>>>>>> On Sat, Jul 27, 2013 at 9:22 AM, Noah Mendelsohn<
>>>>>>>> nrm@arcanedomain.com>  wrote:
>>>>>>>>> >Again, I have no informed opinions on the specific merits, just
>>>>>>>>> suggesting a
>>>>>>>>> >useful role the TAG might play to clarify for the many members of
>>>>>>>>> the
>>>>>>>>> >community who are less expert on this than you are. Thank you.
>>>>>>>  I'm not sure we call out data races anywhere, it's something we
>>>>>>>> just don't do.
>>>>>>>  Well, my recollection may be faulty, but I think that one of the
>>>>>>> reasons the TAG took the trouble to formalize things like the architecture
>>>>>>> document was the belief that it's easier to ask skeptics to stick to rules
>>>>>>> that have been written down, and especially those that have garnered formal
>>>>>>> consensus through something like the Recommendation track.
>>>>>>> Whether it's worth taking a guideline on data races all the way to
>>>>>>> Rec I'm not sure, but it seems that it would be worth setting it down
>>>>>>> formally, perhaps in a TAG Finding/blog post/Recommendation or whatever
>>>>>>> will get the right level of discussion, consensus building, and eventually
>>>>>>> attention.
>>>>>>> Certainly, of the many things that have come up recently relating to
>>>>>>> APIs, this one seems deeply architectural and very much within the TAG's
>>>>>>> remit.
>>>>>>> Noah
>> --
>> Marcus Geelnard
>> Technical Lead, Mobile Infrastructure
>> Opera Software
Received on Wednesday, 7 August 2013 01:57:53 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:50:10 UTC