Re: Integer PCM sample formats to Web Audio API?

I appreciate your desire to discuss this off-list, but I don't feel as if I
have any specific objection with you.

All I have to say in response to this is that in my entire history on the
w3-audio list, you and others continue to say that things are justified
without ever giving specifics. Even here, you say you are designing for a
real platform, but fail to name the platform, list its constraints, or give
examples of how your design is tuned for those constraints. I don't
understand how you expect any improved level of discourse if you continue
to refuse to give detail like this. I am certain that I have made this
mistake in my posts as well, so I would appreciate it if you did me the
same favor of pointing it out when I do it.

As far as the question about whether things have been benchmarked, we have
real world test cases that are memory constrained, which is why Jukka asked
for int16 samples in the first place. We have disclosed the scenarios we
care about and they are available in the open; you have not disclosed your
scenarios, so we are unable to evaluate them.

Many of the design decisions being questioned on the list are decisions you
or others made in the past. I'm certain that you made them in good faith
based on the information you had at the time; I would hope that nobody is
questioning that. The problem is that very often we do not have the
information you used to guide your decision, nor do we have the context
that led you to optimize for one thing at the expense of another. It is
critical for you to clearly communicate this information if you want us to
understand why things like the audiobuffer sample format are so important
to you and why you seem unwilling to evaluate other options.

Also, if you simply think it is too much trouble to rearchitect mixer
internals or overhaul the spec, that's fine too: be up-front about it
instead of beating around the bush. You have clearly voiced opposition to
this idea in the past (for example in the race condition discussions), and
I can respect the desire to avoid throwing out too much existing work,
especially when it means the new software will have to go through a new
process of QA and review by developers. But if that's the reason why new
features and bug fixes are being dismissed, it should be clear instead of
masked behind vague concerns.

And for a couple bits of more focused discussion:

OK, if Float32 everything is critical for performance on certain platforms,
what platforms are they? Why are they important? (Are they the future of
PCs? Are they widely used? Are they key to the business objectives of a
stakeholder like Apple or Google?) What are the other constraints of that
platform that we should also be optimizing for?

What assets do those platform(s) have that make memory use issues not a
problem? Are there alternative approaches that should be used to control
the size of AudioBuffers other than the int16 buffer support that is being
requested? If so, what are they, and do they work on all the platforms they
care about? If there aren't good alternatives, and int16 is unacceptable,
what is an acceptable alternative?

On Thu, Jan 16, 2014 at 2:07 PM, Chris Wilson <cwilso@google.com> wrote:

> After a lot of consideration, I'm sending this just to you.  Feel free to
> respond, forward to the list, or ignore, as you wish.
>
> On Tue, Jan 14, 2014 at 8:01 PM, K. Gadd <kg@luminance.org> wrote:
>
>>   Having such an option in the API gives the implementation an
>>>>> opportunity to save memory when memory is scarce, but it's not necessarily
>>>>> forced to do so.
>>>>>
>>>>
>>>> The whole point is to force the implementation to save memory. An
>>>> application that runs out of memory 80% of the time is not appreciably
>>>> better than one that does so 100% of the time - end users will consider
>>>> both unusable.
>>>>
>>>
>>> Given all the other factors that may change memory usage in the web
>>> platform, I'm not sure why this one feature will solve that problem.  Or
>>> even come close.  Again, I'm not saying I see no reason to look closely at
>>> this; I'm just saying that I don't think this is as big a slam dunk as you
>>> appear to, and I think there are notable situations when it is better to
>>> NOT store that data in int16, and there will be ...
>>>
>>
>> What situations are these? I find it hard to imagine a scenario where
>> software playback is going to benefit tremendously from using 2x the memory
>> to store sample data. Certainly there are huge advantages to *mixing* in
>> floating-point; are you arguing that making the mixer slightly faster
>> merits using double the memory (and thus, double the memory bandwidth, if
>> not more - memory bandwidth being especially precious on mobile platforms)?
>> Must the floating-point version of said buffer be the de-facto storage
>> format even though it is merely a minor mixing efficiency optimization?
>>
>> I am dubious about the tremendous cost implied by converting from int16
>> to float32 in the mixer, also. It's a trivial, common operation, and
>> depending on architecture I would expect it could pay for itself in reduced
>> memory bandwidth usage and more efficient use of L1/L2/L3 caches. Have you
>> benchmarked this? Do you have test cases that demonstrate a tremendous
>> performance win by using float32 for everything versus int16 or int8?
>>
>
> 1.)  I never said "tremendous" referring to such a cost; I expressed
> caution about exposing those switches to developers, rather than attempting
> to answer the question in the engine, and either letting it optimize or
> changing the behavior.  2.) You'll also note a distinct lack of me saying
> "the way it's done is the way it should be done, no discussion, go away"
> although that seems to be how you are responding to me.  3.) No, I have
> *NOT* benchmarked this (have you?) - that would be precisely the sort of
> information I'd like to see to understand what the right thing to do is
> here.  You make statements that imply you are convinced that on-the-fly
> conversion of the data buffer is going to be at worst a wash, and
> potentially better overall due to reduced memory bandwidth usage and more
> efficient caching.  I am not calling you a liar, nor do I fail to
> understand what you are talking about; however, I haven't seen actual
> comparisons of those approaches, so I'd like to actually understand that
> cost before I just agree that we should make what is clearly an impactful
> change.
>
> Throughout this conversation, I feel that all I've said is "we should
> understand this cost better" and "we shouldn't expose switches unless
> there's a need for that switch to be exposed, rather than just doing the
> right thing."  By attacking my caution and making statements like "Must
> the floating-point version of said buffer be the de-facto storage format
> even though it is merely a minor mixing efficiency optimization?", you're
> not winning me over - because no, I didn't say it must be the de facto
> storage format yet, and I'm not convinced by data that it is only a minor
> mixing efficiency optimization (and by the way, the degree to which it's an
> optimization would clearly change with usage pattern).
>
>  On this whole subject it is important to realize that when talking about
>>>> developers porting games and multimedia software from other native
>>>> platforms, it is usually not wise to assume they are idiots that will shoot
>>>> themselves in the foot.
>>>>
>>>
>>> That was not the intent, and I was certainly not making that assumption.
>>>  However, those aren't the only developers that would have this API
>>> available - and I would venture some of them would choose to make this
>>> decision without understanding how it may affect them on other devices or
>>> browsers, now and in the future.  Mostly because that's pretty much
>>> impossible to know.
>>>
>>
>> Sacrificing current-day usability in favor of some hypothetical future
>> platform is not a wise decision when we are dealing with existing software.
>>
>
> It is not a hypothetical future platform; it is a platform, different from
> the one you are building on today, and it does not have (for the most part)
> existing software.  Please stop trying to beat into me that we should just
> expose the same kinds of switches and knobs that have always been there.  I
> AM NOT saying that we should be ignoring your feedback; we should, in my
> opinion, be considering this guidance in designing the API.  However, the
> group's charter and use cases are not centered around making it easy to
> port from any existing sound platform.
>
>
>>   Yes, developers make mistakes, and they ship broken software that
>>>> relies on bugs in browser implementations - I can understand the reluctance
>>>> to give developers more ways to make mistakes.
>>>>
>>>
>>> It's not "reluctance to give developers more ways to make mistakes" at
>>> all.  It's "caution in exposing low-level platform implementation details
>>> unless you are absolutely, positively certain it can be made a net win
>>> overall."  Every low-level implementation detail that's exposed makes it
>>> that much harder for the web platform to scale across devices, and puts
>>> more onus in the developer to own that scalability; that begs for caution.
>>>
>>
>> The source format (bitness & sample rate) of audio is not 'a low level
>> platform implementation detail' any more than the pixel format of a source
>> image is a low level platform implementation detail. The file formats audio
>> is loaded from and rendered to contain this information; authors select it
>> explicitly given particular tradeoffs (i.e. doing some recording at high
>> sample rates then mixing down to lower sample rates). You cannot simply
>> hide it behind a wall and pretend it doesn't exist. We're not talking about
>> abstractions here like those in 3D rendering, where the exact mechanics of
>> fragment rendering and vertex layout are left up to the vendor (as long as
>> they satisfy the requirements in the spec); we are literally talking about
>> foundational details here. As I mentioned before, such abstraction would
>> not be tolerated for textures in rendering (though you could certainly
>> offer it as an 'opt-in' way to somehow save on memory and texture
>> bandwidth).
>>
>
> And yet you're speaking of JavaScript, a language in which (as you alluded
> to earlier) the standard "number" type that encompasses integers and
> floats, and largely hides that detail from the developer.  The Web platform
> in general attempts to only scalably expose levels of detail such as this.
>
>  Again, I would point out that making a change that would allow
>>>> developers to force the integer storage of buffers would have negative side
>>>> effects, and all I'm cautioning is those should be carefully examined and
>>>> weighed.  I would postulate a set of developers would say "well of course,
>>>> my data is 16-bit 22kHz, of course I want to force the data to be stored
>>>> that way to save memory!" without considering that by doing so, they are
>>>> going to be burning battery life (aka CPU time).  That's not always the
>>>> right tradeoff.
>>>>
>>>
>> I'm not advocating that everything must be done the same way. I'm
>> advocating for having an actual solution for this problem instead of
>> continuing to wave your hands using (at least in my history following this
>> list) wholly unstated hypothetical future use cases as justification. You
>> don't have to rearchitect the whole Web Audio pipeline or introduce a
>> sweeping set of new features, just provide a real-world solution for
>> controlling the (already extreme) memory usage of AudioBuffers.
>>
>
> I'll thank you not to accuse me of "continuing to wave my hands using
> wholly unstated hypothetical future use cases as justification,"  and in
> general not treating me like an idiot.  Thanks.
>
> You'll note that I have repeatedly made statements like "should be
> examined" and "I'm not saying I see no reason to look closely at this",
> as for all I know (without any data), perhaps the right thing to do is to
> completely remove the statement that says the format should be converted,
> and all buffers should be stored internally in their source format.  Or, as
> I've already suggested, leave it to the implementation to make that
> decision.  At any rate, please stop hammering on me as if I am a handwaving
> idiot
>
> As Marcus mentions, the int16 -> float32 conversion is going to be LESS
> problematic than the resampling to a different playbackRate; and that
> resampling can expand your data even more than the int-float conversion
> (6-12x wouldn't be out of the question), while also potentially having more
> implications in CPU usage and quality, so I expect a lively conversation
> there.
>
> -C
>
>

Received on Thursday, 16 January 2014 23:42:58 UTC