FW: diacritic marks from Reuven Nisser on 2004-02-05 (w3c-wai-gl@w3.org from January to March 2004)

From: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
Date: Fri, 06 Feb 2004 01:08:38 +0200
To: W3C Accessibility Guidelines <w3c-wai-gl@w3.org>
Message-id: <EOEHIKCGOKGNIEEKJHEKCEDGDNAA.rnisser@ofek-liyladenu.org.il>
Hello Joe Clark,
I am a father of a blind child and I was part of the developers of the first
text to speech in Hebrew.
The problem with Hebrew is that with regular Hebrew there are no vowels in
the words. When you look at each word there are in average are 2.3
possibilities to say it. When you look at a sentence and use regular grammar
rules you can get to 80% accuracy. If we will ignore minor accuracy problems
we can get to 95% but this is it.
To make a better Hebrew text to speech you need to start using artificial
intelligence which means big money and the result will need a long
processing time which is not good for our goal.

So, we need to use shortcuts around the problem to fix the 5% by giving the
text to speech "hints" regarding the right pronunciation. In Hebrew we call
these hints diacritic marks and they are used in books and should be used
when using Internet as well. The only question arises is how many words
should have these marks and which words? The answer for this question was
passed by ISOC to the Hebrew Language Academy.

However, there are dyslectic people who get too much information through
text and they should not see the diacritic marks. So there should always be
a way to hide them.

Regards,
Reuven Nisser
Ofek Liyladenu

>> -----Original Message-----
>> From: w3c-wai-gl-request@w3.org
>> [mailto:w3c-wai-gl-request@w3.org]On Behalf Of Joe Clark
>> Sent: Wednesday, February 04, 2004 10:58 PM
>> To: WAI-GL
>> Subject: Re: diacritic marks
>>
>>
>>
>> > We are happy with the current wording and prioritization of the success
>> > criteria. :)
>>
>> Quite possibly that should be a warning sign.
>>
>> > Background
>> > Some languages use diacritic marks to give the pronunciation of a word.
>> > In some languages (like Hebrew and Arabic) most spellings, without
>> > diacritic marks, can be resolved to more then one word. Use of context
>> > enables the average reader to work out what word was intended.
>> >
>> >  Natural language processing used in screen readers can often
>> guess what
>> > word is intended without diacritic marks. However all screen readers
>> > will often make mistakes.
>>
>> Then fix the screen readers, Lisa. Perhaps you'd like to take on that
>> project rather than advancing the preposterous idea that Web authors be
>> forced to write kiddie Hebrew and Arabic rather than the true forms
>> naturally used by adults.
>>
>> > It is estimated (by ISOC -il - need to get refrences) that 3% of the
>> > population have a visually impaired memory which makes reading many
>> > words without diacritic marks extremely difficult. This segment of the
>> > population can use a screen reader to help them though the reading
>> > process. However when the screen reader guess a word incorrectly, they
>> > will often be unable to correct the mistake themselves, as guessing
>> > different pronunciation of words based on an identical spelling is
>> > difficult to impossible for many dyslexics.
>> >
>> > It should also be remembered that screen readers are difficult to use
>> > and are expensive.
>>
>> But they may be the correct adaptive technology. Suddenly difficulty and
>> that perennial bugbear, expense, are insurmountable problems for this
>> group but not, say, for the totally blind or people with severe dyslexia?
>>
>> > Vision impaired people using screen readers
>>
>> Oh. So we're including them after all.
>>
>> > are also affected by missing
>> > diacritic marks.  All screen readers will  make mistakes, and will
>> > pronounce the wrong word. This will occur more often then an incorrect
>> > word pronunciation makes grammatical sense.
>>
>> That *sentence* doesn't.
>>
>> > The user then has to guess
>> > the meaning of a sentence - by as guessing different pronunciation of
>> > words based on an identical spelling. This extra processing time on the
>> > users part means that they can not speed up the screen reader,
>> and often
>> > have to reread passages.
>>
>> This continues to be an argument for better screen readers and will hold
>> true for any language containing homographs, including English.
>>
>> > Finally I want to personally thank everyone who help contribute and
>> > resolve this difficult issue.
>>
>> It isn't anywhere near resolved. The proposal is unsound if not asinine,
>> as it ignores the reality of written Hebrew and Arabic. It will
>> be laughed
>> at and ignored by authors if passed by Working Group members, who tend to
>> simply take orders from you without engaging their rational faculties.
>> Plus they don't know anything about writing systems, Richard Ishida
>> excepted.
>>
>> --
>>
>>   Joe Clark  |  joeclark@joeclark.org
>>   Author, _Building Accessible Websites_
>>   <http://joeclark.org/access/> | <http://joeclark.org/book/>
>>
>>
Received on Thursday, 5 February 2004 18:14:52 UTC