RE: diacritic marks from Reuven Nisser on 2004-02-06 (w3c-wai-gl@w3.org from January to March 2004)

From: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
Date: Fri, 06 Feb 2004 09:32:15 +0200
To: Charles McCathieNevile <charles@w3.org>
Cc: W3C Accessibility Guidelines <w3c-wai-gl@w3.org>
Message-id: <EOEHIKCGOKGNIEEKJHEKCEDPDNAA.rnisser@ofek-liyladenu.org.il>

Hello Charles
>> For a large number of common kanji, like in Hebrew, there is enough to
build
>> effective lookup tables (the glossary approach) so they can be pronounced
>> correctly by a text to speech engine. But for a large number of other
common
>> kanji, and more particularly for less common ones, this isn't feasible.
>> Something that makes it possible to provide clear interpretation is
therefore
>> important. One technique is to use clear characters, as used to write
simple
>> documents pitched at a broad audience. It is preferable in a way that
they
>> need not be always visible - think of the differences between closed and
open
>> captioning on television.

One question though regarding the attached paragraph. When you look at a
single Japanese word is there always a one and only one way to pronounce it?
If so, then the problem in Japanese is "only" a lookup table for each word
and the phonetic representation.
In Hebrew and in Arabic, if you look at a single word, there are in average
2.3 ways to pronounce it. There are even words with 13 ways to pronounce.
Each pronunciation has a different meaning. To eliminate several
possibilities you need to analyze the sentence grammatically. To eliminate
more you need to get to text semantics.
For example, S-F-R could be SEFER (book) or SAPAR (barber).
Regards,
Reuven Nisser
Ofek Liyladenu

Received on Friday, 6 February 2004 02:32:33 UTC