Core Use Cases - v1.0

AudioXG colleagues -- 

Per today's teleconference, here are 7 suggested classes of use core cases, in a proposed priority order with the most important ones listed first.  This is not an official WG document, just my suggestions.  Hope this is helpful, I'm sure there will be a lot of revisions/alterations before we're done.

I expect a diversity of opinion on the relative priorities.  This ranking's based on my experience as a music and sound artist for games, and secondarily on my experience with web and mobile audio engines.  I focused first on whatever developer needs likely to be truly widespread, not the most advanced possibilities.  Developers are likely to want an extremely easy-to-code way to add these common and essential things.  Audio and music content are likely to continue to be created by sound designers and musicians using their existing tools, so we should be guided by that.

	-- Chris G.

--------------------------------

Core Use Cases
v1.0 - June 28, 2010
Chris Grigg

This is intended to cover the practical aesthetic needs for music, sound effects, and dialog in the most prevalent types of web pages, sites, and applications: for example, news, search, messaging, social media, commerce, video sites, music sites, blogs, and games.  Experimental and advanced audio use cases may well have more and/or different requirements.

In order from basic to advanced:

	Class 1. Usability/Accessibility Speech
	Class 2. UI Sounds
	Class 3. Basic Media Functionality
	Class 4. Interactive Audio Functionality
	Class 5. Audio Production Basics
	Class 6. Audio Effects II: Mastering
	Class 7. Audio Effects III: Spatial Simulation

Each class is detailed below.

Definitions used below: 
	-- "sound" means recorded audio, synthetic audio, or synthetic music content (sequence + instruments)
	-- "spoken" mean speech content either as recorded audio or synthetic speech

-------------------------------------------------------

Class 1. Usability/Accessibility Speech

These use cases address basic usability of web pages generally.  I mention them since the XG Charter includes speech synthesis, however they may already be addressed in part by other W3C specs (for example CSS3 Speech, SSML, "Voice Browser" WG, Content Accessibility Guidelines, etc.).

- Upon user click, mouseover, etc. (or as a preferences behavior):
	-- Trigger spoken version of a web page's (or textual element's) text contents (for visually impaired users)
	-- Trigger spoken help for a web page (or entry form)

- On error:
	-- Trigger spoken error message


Support for multiple alternate versions in different natural languages should be considered. 

-------------------------------------------------------

Class 2. UI Sounds

These use cases bring to web apps the basic UI aural feedback (AKA 'sonification') typical of native apps and games.  They may already be addressed in part by the HTML5 event handling model.

Trigger one or more sounds when:
- User clicks (hovers, etc.) any given visual element within a web page
- User presses Tab key to move to the next visual element within a web page
- User presses a key while in a text entry field
- A visual element within a web page changes its own state (open, resize, move, transition, close, etc.)
- A window changes state (open, resize, move, transition, close, etc.) 
- (Etc.)

-------------------------------------------------------

Class 3. Basic Media Functionality

These use cases bring simple audio-for-visual-media behaviors to web applications.  They may already be addressed in part by the HTML5 event handling model.

- Automatically trigger one or more sounds:
	-- In synch with an animated visual element (animated GIF, SVG, Timed text, etc.)
	-- As continuous background soundtrack when opening a web page, window, or site

- Connect user events in visual elements (click etc.) to: 
	-- sound element transport controls
	    (play/pause/stop/rewind/locate-to/etc.)
	-- music synthesis events
	    (note-on/note-off/control-change/program-change/bank-load/etc.)

- Upon user click, mouseover, etc. (or as a preferences behavior):
	-- Trigger speech containing additional informational content not present in page text (or visual element)
	    (consider multiple alternate versions in different natural languages)

-------------------------------------------------------

Class 4. Interactive Audio Functionality

These use cases support common user expectations of game audio, but can also improve the user experience for traditional web pages, sites, and apps.  Interactive audio can be defined as (i) sound that changes based upon the current game/app state, and (ii) sound with enough built-in variation to reduce listener fatigue that would otherwise occur over the long timespans typical of many games.

- Branching sounds (one-shot -- selection among multiple alternate versions)
- Branching sounds (continuous -- selection among multiple alternate next segments)
- Parametric controls (mapped to audio control parameters like gain, pan/position, pitch, processing, etc.)

Note: This functionality may require either defining new media type(s), or perhaps a change to the <audio> element semantics.  In interactive audio, a sound is not the same as a single playable media file; typically a sound (or 'cue') is some kind of bag of pointers to multiple playable audio files, plus some selection logic and/or parameter mapping logic.

-------------------------------------------------------

Class 5. Audio Production Basics

For sounds that are generated (or in some cases combined) in real time, these use cases support common listener expectations of well produced music and sound.  

- Mixing:
	-- By default, multiple sources + effects combine to one output
	-- By default, audio sources' effects sends combine to designated effects
	-- <video> elements with audio output are treated as audio sources
	-- (maybe submixes, but this is more advanced == less important)

- Audio Effects I: 
	-- Reverb (studio production types, not physical space simulations)
	-- Equalization
	-- (maybe Delays, Chorus, etc. but this is more advanced == less important)

These effects may be usefully applied on a per-source basis, on an effects send basis, on a submix output, or on the master mix output.

Note: In many cases recorded music, sound effects, and speech will (or can be made to) incorporate their own production effects, and therefore will not need API support.

Note: We could stop after Class 5 and still support most game genres.

-------------------------------------------------------

Class 6. Audio Effects II: Mastering 

For sounds that are generated (or in some cases combined) in real time, these use cases support a higher level of listener expectations of well produced music and sound, and may also increase intelligibility generally.

- Dynamics (compression, limiting)
- Aural enhancement (exciters, etc.)

Mastering functionality is more advanced == less important than the above classes.

Note: In many cases recorded music, sound effects, and speech will (or can be made to) incorporate their own mastering effects, and therefore will not need API support.

-------------------------------------------------------

Class 7. Audio Effects III: Spatial Simulation

For those users listening in stereo, 3D spatialization causes a sound source (or submix) to appear to come from a particular position in 3D space (direction & distance).  This functionality can be useful in some game types where 3D position changes in real time, and in teleconferencing where spatializing each speaker to a different static position can help in discriminating who is speaking.  Environmental reverb provides clues as to the size and character of the enclosing space (room/forest/cavern, etc.), supporting a more immersive gaming experience.

- 3D spatialization
- Environment simulation reverb

Spatial simulation is more advanced == less important than the above classes.

-------------------------------------------------------

..end..

Received on Monday, 28 June 2010 21:40:18 UTC