Re: Integer PCM sample formats to Web Audio API? from Chris Wilson on 2014-01-17 (public-audio@w3.org from January to March 2014)

From: Chris Wilson <cwilso@google.com>
Date: Fri, 17 Jan 2014 08:56:23 -0800
To: Katelyn Gadd <kg@luminance.org>
Cc: "public-audio@w3.org" <public-audio@w3.org>
Message-ID: <CAJK2wqWrYE2NMfkfedDjiQ5aWL=M=C-9k5nSPQ7cyP65RM3uHA@mail.gmail.com>
On Thu, Jan 16, 2014 at 3:41 PM, K. Gadd <kg@luminance.org> wrote:

> All I have to say in response to this is that in my entire history on the
> w3-audio list, you and others continue to say that things are justified
> without ever giving specifics.
>


>  Even here, you say you are designing for a real platform, but fail to
> name the platform, list its constraints, or give examples of how your
> design is tuned for those constraints.
>

Web platform.  Javascript runtime (with the attendant experience around
data types).  Scalable across desktop workstations (with plentiful RAM and
CPU) and constrained devices like smartphones.  I'm sorry, I thought the
platform was implied; the WG's use cases and requirements document (
http://www.w3.org/TR/webaudio-usecases/) goes into a lot more detail of the
particular audio scenarios that are most interesting.

As far as the question about whether things have been benchmarked, we have
> real world test cases that are memory constrained, which is why Jukka asked
> for int16 samples in the first place. We have disclosed the scenarios we
> care about and they are available in the open; you have not disclosed your
> scenarios, so we are unable to evaluate them.
>

And I didn't dismiss your scenarios; and I said the costs should be
evaluated and weighed (e.g. the cost of converting formats on demand). It's
clear (to me) that changing that strategy would likely cause SOME effects
on the runtime; we should understand what those costs are before just
checking in a change.

Many of the design decisions being questioned on the list are decisions you
> or others made in the past.
>

I would say very few if any of the design decisions being questioned on the
list are decisions I personally made in the past.  (Perhaps you are
confusing me with Chris Rogers, who certainly did make many of those design
decisions.)

I'm certain that you made them in good faith based on the information you
> had at the time; I would hope that nobody is questioning that. The problem
> is that very often we do not have the information you used to guide your
> decision, nor do we have the context that led you to optimize for one thing
> at the expense of another. It is critical for you to clearly communicate
> this information if you want us to understand why things like the
> audiobuffer sample format are so important to you and why you seem
> unwilling to evaluate other options.
>

I wasn't aware that the audiobuffer sample format IS so important to me.  I
care about the complexity of the API that's exposed; I *would* certainly
care about the quality (not an issue with int/float conversion) and CPU and
memory impact of any change (again, need data here); I don't have a
particular axe to grind other than that, and I feel like I've repeatedly
said that.

Also, if you simply think it is too much trouble to rearchitect mixer
> internals or overhaul the spec, that's fine too: be up-front about it
> instead of beating around the bush.
>

As indeed I would.  In case my opposition is not apparent - I believe it
would be a very, very bad idea to rearchitect the audio system to be a
fixed-point PIPELINE.  That has many implications in usage patterns that I
think break far too many of the tenets of the Web Audio scenarios to be
rational.  I have no particular qualms about the spec needing overhaul - in
fact, if you'll look at my personal past history here, I've repeatedly said
that I think parts of the spec need serious work.  (E.g. the
DynamicsProcessor issue I just raised.)

To the issue at hand: I am not now, nor do I feel I have ever been,
fundamentally opposed to enabling or even requiring implementations to use
native bit depths and integer formats for internal buffer storage; I *DO*
want to feel like I have a better handle on the impact of doing that before
changing the spec, particularly to require such a thing.  I'm *somewhat*
opposed to revealing those data types outside the internals (e.g. having
getChannelData return int8 and int16 types, or whatever shape that would
take) because I don't see the significant benefit of that, and I feel it
complexifies the API in a non-Web-y way; I would not characterize this as
"my mind is made up," but I have stronger opposition to that.


> You have clearly voiced opposition to this idea in the past (for example
> in the race condition discussions),
>

To overhauling the spec?  No.  No I have not.  I have stated opinion, and
my opinion is frequently cautious, but I've tried to find the middle ground
and data-based decision making paths, and have in fact frequently found
myself in the role of Devil's advocate.  I do NOT believe that I have
steadfastly opposed change in any discussion.

My opinion on the race condition issue remains the same; making these
changes will cause negative performance implications in some scenarios.
 Those implications aren't likely to be huge, and are not in the most
common scenarios - clever design suggestions from ROC and Jer, among
others, helped make the solution much less negatively impactful - and
regardless of those implications, the web platform requires better thread
protection than the previous design, so it is worth the cost.  I'd go so
far as to say I'm personally responsible for moving Google to agreeing with
those changes.


> and I can respect the desire to avoid throwing out too much existing work,
> especially when it means the new software will have to go through a new
> process of QA and review by developers. But if that's the reason why new
> features and bug fixes are being dismissed, it should be clear instead of
> masked behind vague concerns.
>

As indeed it would be.


> OK, if Float32 everything is critical for performance on certain
> platforms, what platforms are they? Why are they important? (Are they the
> future of PCs? Are they widely used? Are they key to the business
> objectives of a stakeholder like Apple or Google?) What are the other
> constraints of that platform that we should also be optimizing for?
>

You keep using words like "critical for performance," that I did not use.
 I said we should understand the impact.  Do you believe there is literally
zero impact from storing buffers and needing to convert each sample when it
is pushed into the pipeline?  Or do you just believe we should handwave,
presume it's less than the positive impact of halving memory usage, and
change?  Do you think we should be enabling int8 as well?  Int24? Do you
think we need to expose these types in the API, or simply require that the
implementation isn't exploding the sizes of objects under the hood?  How,
precisely, and how flexible does this system need to be (as in, precisely
what

Incidentally, please don't use shorthand like "Float32 everything", because
it implies changing the pipeline as well as storing buffers in something
other than float32, and I can't tell if that's what you're referring to or
not. Float32 was chosen for the pipeline because its dynamic range is
substantial enough to encompass the range of human hearing (math redacted),
while still maintaining a floating-point exponent that makes clipping far,
far less of a problem (and means we can use the same pipeline for control
signals that are not typically in the -1 to +1 range, without fear of
clipping).

-C
Received on Friday, 17 January 2014 16:56:52 UTC