W3C home > Mailing lists > Public > public-esw-thes@w3.org > May 2005

Re: FW: Audio labels (was RE: comment: FOAF Depiction and Symbolic Labelling)

From: Dan Brickley <danbri@w3.org>
Date: Tue, 17 May 2005 11:45:42 -0400
To: "Miles, AJ (Alistair)" <A.J.Miles@rl.ac.uk>
Cc: public-esw-thes@w3.org, mf@w3.org
Message-ID: <20050517154542.GM30561@homer.w3.org>

* Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk> [2005-05-17 14:47+0100]
> > Other extensions could also be interesting. While we could 
> > debate which 
> > things go in core vocab and which in other namespaces, it 
> > might be more
> > fun to set that aside for now (while noting that SKOS is only 
> > a Working
> > Draft at this stage, and could change), and explore possibilities 
> > for such extensions. Was there something specific you had in 
> > mind? Audio
> > I think could be very interesting, particularly for SKOS 
> > concept that is 
> > close to the electronic dictionary space, eg. lexical databases such 
> > as Wordnet (although SWBPD WG isn't using SKOS for Wordnet currently).
> > Where a concept is lexicalised, we could point to sound clips, or 
> > Speech Synth markup (eg. see
> > http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/) 
> > ...could have
> > interesting application to accessibility, voice/mobile and perhaps 
> > language learning apps...
> I like the idea of 'audio labels' ...  Can anyone describe a relatively concrete use case?

Autogenerating voice-browser menus?

VoiceXML isnt HTML. HTML was designed for visual Web pages and lacks the
control over the user-application interaction that is needed for a
speech-based interface. With speech you can only hear one thing at a
time (kind of like looking at a newspaper with a times 10 magnifying
glass). VoiceXML has been carefully designed to give authors full
control over the spoken dialog between the user and the application. The
application and user take it in turns to speak: the application prompts
the user, and the user in turn responds.

VoiceXML documents describe:

    * spoken prompts (synthetic speech)
    * output of audio files and streams
    * recognition of spoken words and phrases
    * recognition of touch tone (DTMF) key presses
    * recording of spoken input
    * control of dialog flow
    * telephony control (call transfer and hangup)

Annotation of SKOS concept descriptions with voice data (speech markup,
or audio files, ...) could allow content tagged with those concepts to
be made navigable through VoiceXML-based interactions. Example: a
collection of blog feeds, where the RSS was augmented with skos:subject 
tagging, and the different blog drew on the same (or mapped) concept

[I'm working on some tools to enable blogs to pick up their SKOS 
categories from their neighbours (eg. when I go to add a category to
my blog, it reminds me what categories my friends and colleagues are 
using, and allows links to be expressed, sub-trees to be imported).]

So, why would one want to navigate blogs by having the computer read 
out labels for their categories? (and eg. also navigating by voice

 - maybe you're driving your car, and in a traffic jam
 - RSI or other accessibility reasons for not using mouse/keyboard
 - you're walking around wearing some fancy bluetooth headset,
   looking all Flash Gordon modern, and want to read what people are 
   writing about you...
 - maybe you're navigating some content collection via your TV, with 
   menus, and prefer audio to reading of (even large) fonts 
   on the TV screen.
 - maybe you're navigating a content collection in audio labels made 
   available in your native spoken language, even if the content is 
   in a language you're less profficient in.
 - maybe you can't read the textual labels (in the language they're 
   available in; or in any language).
 - maybe you're navigating a collection of Creative Commons-licensed 
   Ogg/MP3 'talking book' files on your iPod-like-thing, and someone has
   written a study guide that lets you jump around the texts based on
   SKOS-indexed themes that have been collaboratively indexed against
   the collection. Ok handwaving a bit here, but I think that could be 

Whether the final end document is read in classic Web browser, 
or also via text to speech (Max was looking at this...) is a separable
choice I think. Being able to navigate around the content database 
using audio labels doesn't require you to digest the content in audio 
form too. 



ps. is anyone on this list set up to run student projects? maybe on in this 
area could be interesting...?
Received on Tuesday, 17 May 2005 22:32:25 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:05 UTC