RE: Report for ISOC IL FTF from lisa seeman on 2003-12-23 (w3c-wai-gl@w3.org from October to December 2003)

From: lisa seeman <seeman@netvision.net.il>
Date: Tue, 23 Dec 2003 09:16:12 +0200
To: "'Richard Ishida'" <ishida@w3.org>, "'WAI-GL'" <w3c-wai-gl@w3.org>
Cc: "'Martin J. Durst'" <duerst@w3.org>
Message-ID: <01ac01c3c924$a8b362c0$ad00000a@patirsrv.patir.com>

The basic problem is a merger of language. There are not o many Hebrew
speakers, and so some words end up being taken from the English -  A lot
of English words have become part of every day Hebrew and are often even
written in English - mid sentence. 
To test this was not just on laid back sites I went to a "well written"
newspaper
http://www.haaretz.co.il/
At least in today's addition there were two or three words in English  -
Mainly brand names and the phrase "On line"

The airline ElAl also had an English word on their home page mid
sentence. Again they had used the English word as branding. Their logo
is part English Part Hebrew. www.Elal.co.il

A more typical site is http://www.nana.co.il/ or
http://www.esc.co.il/escweb/homepage.asp
Quite a few English words scattered across the page - a lot of brand
stuff like "MAC" but English also occurs for no apparent reason - like
the link to "what's new" (and yes, you can translate that easily
enough). 

All the best
Lisa Seeman
 
Visit us at the UB Access website
UB Access - Moving internet accessibility
 


-----Original Message-----
From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On
Behalf Of Richard Ishida
Sent: Monday, December 22, 2003 1:51 PM
To: 'lisa seeman'; 'WAI-GL'
Cc: Martin J. Durst; Richard Ishida
Subject: RE: Report for ISOC IL FTF



Hi Lisa,

> From: w3c-wai-gl-request@w3.org
> [mailto:w3c-wai-gl-request@w3.org] On Behalf Of lisa seeman
> Sent: 22 December 2003 05:55

<snip>

> passages or fragments of text occurring within the content
> that are written in a language other than the primary natural 
> language of the content as a whole, are identifiable, either 
> through the character encoding used or through direct 
> including specification of the language of the passage or 
> fragment. [X] 

Character encoding information helps you know the script, which may be
useful for font selection or some other rendering considerations, but
doesn't help you with selecting the right voice for pronunciation of the
text.  For example, ASCII text could just as easily be Indonesian or
Malaysian as English.  Text using 'Latin1' characters could represent a
very wide range of languages. So 'either through the character encoding
used' would be inappropriate, unfortunately.

To help me better understand the issue, could you briefly characterise
for me the type of content that causes the problem?  Is it English? How
much of it is there (as a very rough average)?  Is much of it acronyms?
proper names? technical words? etc.

Exploring solutions: can one assume that Israeli text to speech systems
can deal pretty well with the embedded non-Hebrew stuff?  Does that
apply to the tts systems dealing with other languages?  If Hebrew
systems deal with English ok, maybe you'd only have to label stuff that
was, say, Indonesian or Malay??

RI

Received on Tuesday, 23 December 2003 02:16:48 UTC