Re: Specificity in the Web Audio API spec from Chris Rogers on 2012-03-30 (public-audio@w3.org from January to March 2012)

From: Chris Rogers <crogers@google.com>
Date: Fri, 30 Mar 2012 11:02:29 -0700
To: Jussi Kalliokoski <jussi.kalliokoski@gmail.com>
Cc: "Wei, James" <james.wei@intel.com>, "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CA+EzO0=KBHaAiQoMgZqf=Ftye-XQCB4V_J2BMKKjPNZiG2NZqQ@mail.gmail.com>
On Fri, Mar 30, 2012 at 6:37 AM, Jussi Kalliokoski <
jussi.kalliokoski@gmail.com> wrote:

> > I think you raise some interesting points.  What is the goal here?  Are
> you expecting that independent implementations will always produce
> *exactly* the same output for the same input?
>
> Yes, that would be quite ideal. Otherwise if you need that precision (a
> DAW hardly can afford to sound different on differnet platforms, especially
> on such a crucial element as a delay node), you're going to have to exclude
> browsers or resort to a JavaScript implementation for a tool that's
> supposed to be predefined. Kind of beats the purpose of having predefined
> nodes, I think. And having these algorithms well defined in the spec is
> something to push browser vendors to fix their implementations instead of
> marking them as WontFix because it follows the spec that isn't defined well
> enough.
>
>
> > I don't think the spec is intended to give a bit-exact implementation
> across all vendors.  I could be wrong though; Chris will have the
> definitive answer.
>
> Yes, I'd be interested to hear what he thinks. We've had prior discussions
> about this, and it seemed to me that we were mostly in consensus that it's
> best if all implementations produce the same results. Pipe dream? Yes, very
> much, but I think we should do our best to help browser vendors make
> consistent implementations to keep the end developers from having to worry
> about the inconsistencies. :)


Jussi, I recall a discussion we had in a tele-conference call where we
clearly decided that the output would *not* have to be bit-exact.  In the
graphics world there are various implementations of the Canvas2D API
(drawing lines, circles, and much more) which are very far from bit exact
and differ in details such as anti-aliasing algorithms.  Not only do the
results differ from browser to browser, but also they differ between
platforms (Windows vs. Mac OS X vs. Linux, for example).  The appearance of
fonts also greatly differs between the different browsers, and although
graphic designers can be quite demanding about very subtle aesthetic
details, we've come to accept that these differences exist.  Color profiles
when drawing images also can be different.

The Web Audio algorithms can be defined fairly precisely to the same level
of detail that we see in the HTML5 Canvas2D specification.  In many of the
algorithms, such as ConvolutionNode and BiquadFilterNode, the exact
math/equations are even very clear.  Of course there are differences in
floating-point precision between different CPUs, etc., but these
differences tend to be beyond what we would consider worrisome for audio
processing purposes.  Let me give you an idea of real-world differences in
the FFT implementations we have in WebKit to illustrate how we test (and
accept the small differences):

We test the convolution in a rigorous test:
http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/convolution-mono-mono.html
http://svn.webkit.org/repository/webkit/trunk/LayoutTests/webaudio/resources/convolution-testing.js

And although we have four active FFT implementations in use among the
various ports which differ slightly in their output, we still share the
same test for all of them and verify that results are of very high-quality
(line from the convolution-testing.js

    // allowedDeviationFraction was determined experimentally.  It
    // is the threshold of the relative error at the maximum
    // difference between the true triangular pulse and the
    // rendered pulse.
    var allowedDeviationDecibels = -133.5;

 So errors are 133.5dB down in the noise - very precise!


>
> > For your resampling issue, I think that would be a quality of
> implementation issue.  A good implementation will do a good job and a bad
> implementation will do a not so good job. This allows different vendors to
> "compete".  (That's my view point, coming from the cellular industry where
> many things are vaguely specified and you have to work hard to figure out
> how to make it work.  Perhaps audio is different.)
>

I agree.  For resampling algorithms, there are trade-offs between
performance and audio quality.


>
> Perhaps the implementations can do a different job at it, but if we are
> going to allow that, then it might be a good idea to make the
> implementation expose some information of what it does (is it using ZOH or
> linear interpolation, or a sinc filter, and if so, with which parameters)
> to help the developer react to the situation with different filter
> settings, etc. Might be catering for a very small audience that cares,
> though.
>

That might be possible.  I think that WebGL offers some of these features.

Chris
Received on Friday, 30 March 2012 18:02:58 UTC