css3-speech, UA sound mixing (was Re: TPAC F2F and Spec Proposals) from Daniel Weck on 2011-10-18 (public-xg-htmlspeech@w3.org from October 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Tue, 18 Oct 2011 21:12:39 +0100
To: www-style list <www-style@w3.org>, public-audio@w3.org, public-xg-htmlspeech@w3.org
Cc: Chris Rogers <crogers@google.com>, "robert@ocallahan.org" <rocallahan@gmail.com>, Stefan Håkansson LK <stefan.lk.hakansson@ericsson.com>, Alistair MacDonald <al@signedon.com>
Message-Id: <1309EC88-6FC3-4A64-BB25-AEBF4688EC6B@gmail.com>

On 18 Oct 2011, at 19:52, Alistair MacDonald wrote:
> I think we need a more complete Browser Audio Framework, that can be broken down into the following components:
> 
> 1) A browser UI and architecture for controlling audio -- at a tab and device level -- it would not be a pressing matter standardize this functionality and could be done independently by each browser vendor.
> 2) A "Web Audio Data API" with high-resolution timing, 3D spatialization of sources, with standardized effects and algorithms for music and games that accepts inputs from other APIs.
> 3) A common "Sound Mixer API" for the window which allowed for panning, mixing, muting, creating JavaScript Sinks and Worker-Threads. RTC, Web Audio Data and HTML Media elements would play back though the Sound Mixer API.
> 
> I have created a diagram to visualize this concept here:
> http://f1lt3r.com/w3caudio/Browser%20Audio%20Routing.jpg
> 
> With this in mind I think the most pressing concern for right now is an Sound Mixer API. Then a Web Audio Data API. And finally (who knows how far out this would be) an overhaul of the browsers internal audio architecture adding UI features to the UA.

(added CSS Working Group + HTML Speech Incubator Group to this email thread)

Thank you for initiating this discussion (the overview diagram is helpful, by the way). However, I should point out that the CSS Speech Module takes part in the web-browser audio ecosystem as well:

http://www.w3.org/TR/css3-speech

This "aural" presentation layer consists of audio output generated primarily from the underlying speech synthesizer (TTS engine), but also from the browser's regular sound interface (optional audio cues before and/or after spoken words).

Note about volume levels: the user-agent stylesheet specifies default "settings", content authors can alter speech/cue sound levels as they wish, and user stylesheets can override authored intent (as per the traditional CSS "cascade" mechanism and "! important" rules).

Note about audio spatialization: a future version of the CSS Speech Module will support 3D aural positioning (in current Level 3 of the specification, only stereo panning is supported).

The mixing architecture proposed by Alistair would ultimately benefit accessibility, because it would provide end-users with fine-grained control mechanisms over the (potentially concurrent) streams of aural information, all from a unified and coherent interface. I look forward to hearing more about this.

Kind regards, Daniel

Received on Tuesday, 18 October 2011 20:13:13 UTC