W3C home > Mailing lists > Public > w3c-wai-gl@w3.org > October to December 2003

English words in hebrew text RE: Report for ISOC IL FTF

From: Charles McCathieNevile <charles@w3.org>
Date: Tue, 23 Dec 2003 10:53:03 -0500 (EST)
To: Richard Ishida <ishida@w3.org>
Cc: 'lisa seeman' <seeman@netvision.net.il>, 'WAI-GL' <w3c-wai-gl@w3.org>, "Martin J. Durst" <duerst@w3.org>
Message-ID: <Pine.LNX.4.55.0312231046280.9533@homer.w3.org>

In other words, the ISOC group in Israel are possibly doing one of two

1. Asserting that there are a lot of english words which are recognised as
hebrew, although requiring a different pronunciation. Fortunately, they are
easy to spot because they use a different alphabet, and the tools recognise

This would seem to be a fine claim to make - although I would like to
see some connection to what the vocabulary they are using is...

2. Assuming that words included in hebrew text that are written in latin
script are automatically english. I trust this isn't their assertion.



On Mon, 22 Dec 2003, Richard Ishida wrote:

>Hi Lisa,
>> From: w3c-wai-gl-request@w3.org
>> [mailto:w3c-wai-gl-request@w3.org] On Behalf Of lisa seeman
>> Sent: 22 December 2003 05:55
>> passages or fragments of text occurring within the content
>> that are written in a language other than the primary natural
>> language of the content as a whole, are identifiable, either
>> through the character encoding used or through direct
>> including specification of the language of the passage or
>> fragment. [X]
>Character encoding information helps you know the script, which may be
>useful for font selection or some other rendering considerations, but
>doesn't help you with selecting the right voice for pronunciation of the
>text.  For example, ASCII text could just as easily be Indonesian or
>Malaysian as English.  Text using 'Latin1' characters could represent a
>very wide range of languages. So 'either through the character encoding
>used' would be inappropriate, unfortunately.
>To help me better understand the issue, could you briefly characterise
>for me the type of content that causes the problem?  Is it English? How
>much of it is there (as a very rough average)?  Is much of it acronyms?
>proper names? technical words? etc.
>Exploring solutions: can one assume that Israeli text to speech systems
>can deal pretty well with the embedded non-Hebrew stuff?  Does that
>apply to the tts systems dealing with other languages?  If Hebrew
>systems deal with English ok, maybe you'd only have to label stuff that
>was, say, Indonesian or Malay??

Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409 134 136
SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92 38 78 22
 Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
 W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France
Received on Tuesday, 23 December 2003 10:53:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 16 January 2018 15:33:46 UTC