- From: Craig Hanson <chanson@izotope.com>
- Date: Thu, 21 Jun 2012 13:08:01 -0400
- To: "'Alistair MacDonald'" <al@signedon.com>, <public-audio@w3.org>
- Message-ID: <00fd01cd4fd0$6a705130$3f50f390$@izotope.com>
Thanks so much for the introduction Alistair! If anyone in this group has questions and/or comments regarding this document, please let me know. We'd be happy to discuss this further with anyone. Best Regards, Craig Hanson Lead Software Engineer - Technology Licensing iZotope, Inc. 617-577-7799 x410 | <mailto:chanson@izotope.com> chanson@izotope.com Description: PoweredByIzotope From: Alistair MacDonald [mailto:al@signedon.com] Sent: Wednesday, June 20, 2012 10:09 AM To: public-audio@w3.org Cc: Craig Hanson Subject: iZotope's Web Audio Considerations A couple of months ago I sat down with Izotope, an audio company in Boston specializing in DSP for games and music/audio production. They licence code to EA, Harmonix, Microsoft (XBox) etc. Izotope had some great feedback; as well as some questions and suggestions which may be worth adding to the bug tracker where they have not been covered already. I wanted to introduce Lead Software Engineer Craig Handson to the mail list discussions, as I think his input will be valuable to the group. Craig, thanks again for taking the time to review in detail the work so far. I have attached the PDF from Craig to this email, but I am also posting the content of the PDF here so that e have it in the mail list records. iZotope Web Audio Considerations I. Overview The current web audio specification is suitable for implementing small, lightweight signal processing algorithms. The use case of this API is a very important consideration when it comes to its architecture. If the intent is to provide a simple interface for implementing inexpensive, lightweight signal processing routines, it would seem the existing architecture is sufficient. If the intent is to build a robust environment that can handle the adoption of more intensive DSP, we are of the opinion that the underlying technology will impose limitations that render the implementation of intensive real-time DSP impossible. The two main focal points to bear in mind when constructing a robust real-time DSP architecture are: 1. Non-pre-emptible processing - The DSP must be allowed to process continually without interruption or dropouts will occur. 2. Worst case analysis - Average CPU consumption metrics are not important in a real-time signal processing system. Only the worst case metrics should be used to evaluate whether or not a DSP system can reliably run. We will elaborate further on these two topics in section III of this document. Prior to this, in section II, we discuss some general modifications and things to keep in mind for the API itself. Section IV presents an example architecture that we believe would allow for a robust real-time DSP environment. Section V presents matters of intellectual property protection that will allow for high quality DSP to be developed for this platform. II. API Considerations As far as the API itself is concerned, the majority of the implementation seems sound. The AudioNode and AudioParam interfaces generally make a lot of sense. There are, however, a few additions and considerations for the API we'd like to point out. 1. Addition of a query and notification system on latency changes. This should provide the DSP developer and the environment with a way to remain in sync as the parameters of a DSP algorithm change. As there are several different forms of latency, we shall clarify the type of latency we are discussing here is latency from input to output in a given DSP algorithm. The amount of latency is the number of invalid initial samples that should be removed from the beginning of the DSP output. As an example: a given DSP algorithm may report a latency of 1024 samples. If the audio system's buffer size is 512 samples, the first 2 buffers (1024 samples) to come out of the DSP algorithm in the first buffer should be thrown away. The first valid output sample will be the first sample of the 3rd buffer. Additionally, the end of the audio stream should be extended to obtain an additional samples from the DSP. In this example, it can be done by streaming an extra 2 buffers of zeros as input to the DSP. The DSP will output the final 1024 valid samples in those 2 buffers. This behavior is considered the 'system delay' or 'latency' of a DSP algorithm. Latency reporting is perhaps best done by way of high precision floating point seconds. Something similar to implementation used in Mac OS X Audio Units is best. For more details, see 'kAudioUnitProperty_Latency' in: https://developer.apple.com/library/mac/#documentation/MusicAudio/Concep tual/AudioUnitProgrammingGuide/TheAudioUnit/TheAudioUnit.html 2. CPU Monitoring should be done on a 'worst case scenario' basis. The cpu load of signal processing algorithms is not usually constant. Ideally, DSP should be developed with an eye on keeping the CPU load even across buffers. In practice, this can be difficult for some algorithms and there may be CPU spikes during processing. Poorly designed signal processing algorithms may have extremely large CPU spikes that prevent real-time processing. To ensure that CPU load will never go above 100% and the audio system will remain glitch-free, CPU levels should be measured on a worst-case basis. To illustrate this, let's take an example algorithm that requires some audio analysis to take place at some fixed time interval (say every 8192 samples). Let's assume the audio buffer size is 1024 samples and we are trying to process in realtime. Extra processing is going to take place inside the algorithm every 8 buffers and depending on the intensity of this processing, it may well be over 100% while processing every 8th buffer. As a consequence, you will see a CPU spike every 8 buffers and an audio dropout every 8 buffers. If the analysis portion of this algorithm is very heavy on CPU load and the other processing portion is very light, you could see an average CPU load that looks as if it should work in real-time. But don't be fooled, worst-case CPU is what matters for real-time DSP. 3. Many DSP algorithms have boolean and integer parameters in addition to floating point parameters. As we can only see floating point representations of parameters in the current API draft. This can work perfectly fine, but you may want to think about how to standardize this further. Many DSP algorithms also use boolean and integer parameters. Perhaps a solution similar to the VST interface would be satisfactory. There, Steinberg forces parameters to have a range between 0.0f and 1.0f (inclusive) and other types are handled by typecast. You can find more details on the VST interface itself here: http://www.gersic.com/vstsdk/. III. The DSP Environment True real-time DSP implemented in a web browser could change the way musicians, gamers and media producers work in the future. In order to make the API scalable and future-proof, a highly performant environment is necessary. We feel that Javascript itself cannot provide such an environment. There are a few key issues we have noted in our discussions on the topic and the common thread running through all these issues is the assurance that sufficient processing is available at all times. Predictable CPU availability is absolutely critical to the effectiveness of real-time digital signal processing. As stated in the overview, non-pre-emptible processing and worst case analysis are the main issues to overcome. To that end, there are some key considerations: 1. A dedicated DSP processing thread is required to ensure the audio has the highest priority possible during processing. The thread should never be interrupted due to non-DSP code. 2. The general overhead of Javascript will limit the scalability and ability of developers to implement intensive, high performance DSP. 3. Garbage collected languages are not conducive to real-time audio applications as there is no guarantee of when the collector will run. This makes it impossible to predict what the actual CPU of the system will be and to determine whether or not a signal processing routine will be able to operate in real-time. There are many effects that have mass market appeal which we believe would not be possible under the current architecture. The T-Pain Effect is a specific example of one such effect. The general consensus is that audio going through a Javascript layer will likely end up with problems similar to the Android audio system. As the realization for the need for more performance becomes a problem, a push to huge audio buffers will be inevitable. That reliance on large audio buffers will make latency a real problem for anyone that needs a low-latency (5-10ms) solution. Low latency audio has been impossible for developers to achieve on Android without bypassing the audio architecture completely. There is also no way to take advantage of machine optimized functions. There are large speed ups in processing that could be achieved if optimized math functions could be used. The most viable solution for this is an LLVM-based just in time solution for the browser. Something similar to the behavior in Google's Native Client SDK: http://code.google.com/p/nativeclient/ IV. An LLVM-based proposal An LLVM-based implementation would compile, link and run on-the-fly in the browser. This would allow developers to create independent byte-code that would be deployed to a user's machine through the browser and would run in a sandboxed area. Here is the process and how this approach is advantageous: 1. Developers write DSP code in C, C++, or related languages that are not garbage collected. Which has two distinct advantages: a. Far less overhead than Javascript. There is no doubt the code will run much faster. b. No garbage collection. This means writing real-time code is a possibility. 2. LLVM would turn the C/C++ code written by the developer into bytecode (intermediate form). This is intermediate form is distributed by the developer for use in the application. 3. When deployed, LLVM compiles intermediate form code into optimized intermediate form for the target platform. This can take advantage of optimizations made by the LLVM compiler and will produce code that runs significantly faster than what could run with the Javascript layer. V. Intellectual Property Protection For browser-based audio solutions, there should be some concern over intellectual property. As this code tends to be downloaded directly to a user's computer through the web browser, how will companies ensure protection over their IP? If there is no good defense for intellectual property, it will really limit the potential of high quality DSP to be produced for the platform. We should ensure that closed-source audio effects can be made available with no concerns over intellectual property. I put together some links below for anyone who is interested in learning more about iZotope's offerings in the music/gaming industries. Visual waveform editing with RX 2 http://www.youtube.com/watch?v=bzfEk-NoS3A <http://www.youtube.com/watch?v=bzfEk-NoS3A&feature=youtu.be&t=53s> &feature=youtu.be&t=53s 3D frequency analysis http://www.youtube.com/watch?v=2hFpil2x8T0 <http://www.youtube.com/watch?v=2hFpil2x8T0&feature=youtu.be&t=24s> &feature=youtu.be&t=24s iZotopes game audio solutions http://www.izotope.com/tech/games/nativesolutions.asp -- Alistair MacDonald SignedOn, Inc - W3C Audio WG Boston, MA, (707) 701-3730 <tel:%28707%29%20701-3730> al@signedon.com - http://signedon.com
Attachments
- image/png attachment: image001.png
Received on Sunday, 24 June 2012 20:23:47 UTC