Specifying pronunciation - some approaches from Charles McCathieNevile on 2003-11-24 (w3c-wai-gl@w3.org from October to December 2003)

From: Charles McCathieNevile <charles@w3.org>
Date: Mon, 24 Nov 2003 06:27:41 -0500 (EST)
To: WAI GL <w3c-wai-gl@w3.org>
Message-ID: <Pine.LNX.4.55.0311210207360.10886@homer.w3.org>

During the face to face meeting I said I would describe some approaches to
clarifying pronunciation of content. I suggested three possiblities, and here
try to explain them a little more. These are not complete examples, and each
one requires some work to be done before it might be useful.

1. Use Ruby.

As Nakane-san pointed out at the meeting, you cannot simply  use the Ruby
element to specify pronunciation. But it can be used to help, if you provide
appropriate presentation clues.

For example:

  ...Steve <ruby class="sayAs"><rb>Waugh</rb><rt>War</rt></ruby> ...

combined with a stylesheet for audio presentation - for example

@media aural {
  ruby.sayAs rb { speak: none }
  }

@media screen {
  ruby.sayAs rt { display: none }
  }

and an alternative stylesheet for people who are using screen readers that
only understand visual styling

  ruby.sayAs rb { display:none }

This will work moderately well in modern browsers. There are few
implementations of Aural CSS (not none, as commonly believed, and a
recently-released commercial product is one of them).

Common screen readers do not implement Aural CSS, and attempt to present
content according to the visual presentation (where pronunciation isn't
usually important). So the user of such a screen reader will need to know
that they should switch to the alternate stylesheet.  There are issues to
iron out with spelling - where swapping stylesheets again is important...

2. Use SSML

The Speech Synthesis Markup Language is a W3C Specification designed for
Voice applications. It includes markup explicitly for pronunciation. Using
mixed-namespace XML, this markup could be included in HTML. The XHTML+Voice
member submission at http://www.w3.org/TR/xhtml+voice/ shows one approach to
doing this (also including a lot of other elements).

(The Staff Comment on the submission --
http://www-3.ibm.com/software/pervasive/multimodal/x%2Bv/11/spec.htm -- notes
that there were problems in using the original specification in a
royalty-free manner. An updated version is available linked from that
comment, which also suggests the use of Aural CSS).

3. Annotea

The annotea work allows user-defined, machine-readable annotations to be made
on any part of a document. Because it uses Xpointer, it can annotate a word,
or even a part of one, as well as a paragraph.

Annotations are potentially unstable across editing, so this appraoch should
be used only after considering the implications.

4. As an extra thought, linking a glossary that contains pronunciation to a
particular document is something that would follow naturally from linking one
for clarity of words. One would expect it to build on SSML, and be re-usable
with annotations...

So those are some ideas. Is it worth following them up?

cheers

Chaals

Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409 134 136
SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92 38 78 22
 Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
 W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France

Received on Monday, 24 November 2003 12:12:33 UTC