Re: FW: Hebrew screen readers from Al Gilman on 2002-08-28 (w3c-wai-gl@w3.org from July to September 2002)

From: Al Gilman <asgilman@iamdigex.net>
Date: Wed, 28 Aug 2002 08:39:18 -0400
To: <w3c-wai-gl@w3.org>
Cc: er@uts.cc.utexas.edu
Message-Id: <5.1.0.14.2.20020828075153.02074da0@pop.iamdigex.net>
[I have dropped PF from the thread.  We can point PF and AU at the GL 
archives.  - Al]

At 01:50 AM 2002-08-28, Lisa Seeman wrote:

>An interesting perspective from Esther Raizen, controversial but interesting

[a - romanize, write the alternate text in romanization; b - don't try to
get the right vowels for classical spelling, just capture something that is
phonetically sufficient.  Don't try to distinguish homophone variants.]

Well, she is right that if the purpose is to drive text-to-speech, that we
don't have to distinguish between cases that are pronounced alike.  If we
combine the two things she suggested, that approach is already IMHO in the
concept set in the form of providing phonetic equivalents in the 'sampa'
roman-alphabet variant on IPA.  This is just a standards proposal for thw
world-wide TTS industry whether to parse sampa or IPA text for phonetics.
And we may not get them to agree.  Since they do separate pronunciation
libraries for each natural language, including both IPA and sampa as just
two more language models is not necessarily a big deal.  The models would be
very simple because these 'scripts' as the internationalization people
(UniCode) call them are phonetic to begin with.  The library structure for
distributing the variant models is there already in the software
distribution pipeline.

And it is not hard to get the pronunciation from an author, even if
"vocalization" in the sense of inserting the literarily correct vowel
characters is a scarce skill.  The basic tool here is the alignment tool
that aligns a voice recording with its transcript.  Get the author to type
the text in standard, consonants-only spelling.  Then get them to read it to
the recorder.  Then the alignment tool matches up the sound record with the
text record.  This gives you not only for voice browsing the sound samples
to go with prompts for the dialog, but also a validated sound/text
relationship and it is easy at this point to extract an IPA or sampa
transcription of how each word was produced.  The validation check is that
the system reads the text back from the by-exception phoneticized version
and the author edits by hand the words that get spoken wrong in the machine
reconstruction of the utterance.

To get only the ambiguous words marked up with phonetics, it is enough to
take the words in the text, compare them with the dictionary to see which
are ambiguous as to their pronunciation, and mark up the ambiguous ones with
their phonetics extracted from the speech sample.

There is a second approach for teaching the language and other applications
where correct vocalized spelling is required.  This is to construct a
sentence-level semantic model of the utterance so as to have contextual cues
aiding the discrimination of when a given consonant-spelled-word is a noun
vs. when it is a verb, and when it is which sense as broken out in the
wordNet sense nodes.  If we can get it bound to a unique wordNet node I
expect that the correct vowel-ization will be a matter of record, will be
unique for that node, and we can do it all with reference data in the
dictionary.

This runs like the above tool except that the semantic-model-building is in
the form of a binding to wordNet rather than to the dictionary, with an
added layer of sentence structure as an aid to reduce the number of
ambiguities that have to be taken back to the author for interactive
resolution.  We are building an interpretation of the consonants-only text
as sentence structures populated with wordNet nodes.

The interactive authoring techniques stuff is how to quickly present to the
author on the fly what their choices are as to wordNet nodes.  In this case
it is cheap to present vowelized spellings, but as Professor Raizen says,
the author probably doesn't know from vowel spelling anyway.  This is where
the picture thesaurus could be used.  Presenting the likeliest options as to
wordNet bindings in a little fly-out display attached to the ambiuous word
in the text with a thumbnail image and a brief phrasal identification of the
variant in some sort of form where it was easy for the author to indicate
which choice they meant, to ask for more options or more information on the
options, this is the sort of interaction that would get the by-exception
vocalized text captured in less than ten times the time it takes to type it.

Al

At 01:50 AM 2002-08-28, Lisa Seeman wrote:

>An interesting perspective from Esther Raizen, controversial but interesting
>
>Her second idea, could be modified to just not requiring full vowels but
>just enough to differentiate separate meaning is supported by the current
>wording of the guideline
>yes the academics will be some what upset
>
>If we do consider modifying the ancient and historical Hebrew vowel system
>then may I recommend that we add a header setting which allows most
>renderings to not show the vowels encoded. Otherwise we may find ourselves
>in an area of religious sensitivities.
>Such a heading setting (through PF) would also make people more comfortable
>about making mistakes in their vowels.
>
>Forwarded with permission
>
>
>All the best,
>
>Lisa Seeman
>
>UnBounded Access
>
>Widen the World Web
>
>http://www.UBaccess.com
>
>
>
>-----Original Message-----
>From: esther raizen [mailto:er@uts.cc.utexas.edu]
>Sent: Tuesday, August 27, 2002 6:05 AM
>To: Lisa Seeman
>Subject: RE: Hebrew screen readers
>
>
>Lisa,
>This is what I expected, more or less.  Development for commercial
>purposes will always require a high return on the investment, which
>in itself is very high.  An affordable solution will probably have to
>come from academia, and even there you will rarely find the
>willingness to absorb the cost and create a tool that is available to
>the public at no cost or at a nominal cost.  I hope that the Technion
>takes it on in some way.
>As for vowels:  There are two reasons why vocalization is difficult.
>One is technical, that is, how to physically insert vowels into a
>text (for which, as you write, a number of very reasonable solutions
>exist; I have been doing it for years, and although I still view it
>as a daunting task, it is not impossible at all).  The second, which
>is more significant, is the simple fact that very few people know how
>to vocalize Hebrew.  This is normally done by a professional, which
>again involves high costs.  People will not vocalize their texts
>because they will not know how to do that, and even if they can do it
>technically, most people will not want to appear ignorant by entering
>sequences of wrong vowels.  This is enough to kill the whole
>endeavor, I am afraid.
>Unless we advocate one of two things:
>Writing Hebrew in Latin characters (people will jump through the
>roof, but this is not such a terrible idea if one uses it for special
>purposes).  If, as I wrote to you in an earlier message, this is
>somehow done as hidden text behind the Hebrew presentation, and the
>screen reader reads the hidden rather than the visible text, the only
>work involved on the part of the author would be to write the passage
>twice, once in Hebrew and once in Latin characters.  This is much
>easier than vocalizing.
>Deciding on a reduced, and, therefore, non-grammatical, vocalization
>system, where you have one symbol for each vowel irrespective of its
>historical origin.  Thus, the noun ganav, thief, written
>gimel-patach-nun-dagesh-kamatz-bet, and the verb ganav, he stole,
>written gimel-kamatz-nun-patach-bet  will be both written
>gimel-a-nun-a-bet.  Again, people will jump through the roof, but
>that is the reality of all script reforms, and as long as one views
>it as a solution to a specific problem which does not have to apply
>to the language in general there is a chance that with good reasoning
>it will be accepted.  Accessibility is the best reasoning I can think
>of.
>Feel free to forward this message to people-- I will be glad to
>communicate with others on the matter.
>Best,
>Esther
>--
>Esther Raizen, Ph.D.
>Assistant Professor
>Coordinator of Hebrew language program, Dept. of Middle Eastern
>Languages and Cultures
>Graduate Advisor, Center for Middle Eastern Studies
>The University of Texas at Austin
>Austin, TX 78712
>Phone:  512-475-6654
>e-mail:  er@uts.cc.utexas.edu
>web page:  http://www.lamc.utexas.edu/hebrew
Received on Wednesday, 28 August 2002 08:39:27 UTC