W3C home > Mailing lists > Public > w3c-wai-gl@w3.org > October to December 2003

getting off topic Re: WCAG and some linguistics problems

From: Charles McCathieNevile <charles@sidar.org>
Date: Sat, 18 Oct 2003 02:50:17 +0200
To: "Roberto Scano - IWA/HWG" <rscano@iwa-italy.org>
Message-Id: <AFCD16C1-0105-11D8-B85D-000A958826AA@sidar.org>

In one sense you are right - the only solution that seems to have much 
value is using proper standard character sets.

So the rest of the strategies are about how to convert content (and 
software) to use and declare proper recognised character sets.

A longish ramble on how this comes about (feel free to ignore - I don't 
disgree with Roberto's conclusion).

But the reasons why people don't just use iso8859-2 (or utf-8 or any of 
several other schemes that are recognised widely) are generally to do 
with the software they are using - until recently internationalisation 
has been dreadful in most software, and there is still a lot of new 
software around that is pretty bad.

For the web people have also had the option of writing &o-umlaut; etc 
(although it is pretty nightmarish for Chinese, Japanese or Korean 
characters, and even Vietnamese (which is latin letters plus a couple 
of old european letters, and a lot of accents) was pretty difficult, 
with people standardising in lots of different ways. The same thing 
happens to a lot of small languages - hungarian is large compared to 
the Yolngu Matha group of languages, which probably have about 10 000 
speakers of the 31 languages, with some of them viable and some of them 
close to extinct.

When computers first came along people made a non-standard font to deal 
with the characters they have (mostly various underlined or accented 
versions of latin characters, plus the "tail-n" which is a phonetic 
alphabet character for "ng" that is present in a character set designed 
for greenlandic, if I recall correctly. This was when the Yolngu 
speakers with a computer knew each other's telephone numbers. (Not that 
there were a lot of telephones in the area).

Many of the texts that exist are culturally sensitive, and are not made 
available to people without permission. Not surprisingly, I believe 
nobody has put together a Unicode character set for Yolngu Matha and 
set up keyboards so people can do things the standard way - after all 
we are talking about a small number of computers in a handful of 
schools as being the entire market (actually it is larger than that, 
but it really isn't many people, and it isn't a rich area of 
Australia). They have a non-standard font and so far it seems to work 
well, without finding one of the rare i18n specialists to fix things 
for them. (When I was there I didn't know quite how to do the fix - now 
it would be relatively easy, but I'm working on the other side of the 
world in several ways. Until then nobody had even understood there 
could be a problem... :(

The Hungarian (and Vietnamese) markets are somewhat larger and more 
advanced. But the problem is the same - the various different practices 
have become entrenched in a way where it isn't quite seen as economical 
to change everything over, yet. Things just work a little bit less well 
than they did, although most of the time the old ways work pretty well. 
(Some countries still use inches as a unit of measurement, and don't 
see that there is any reason not to...)

Of course the Yolngu case could be readily fixed by changing the 
characters they use - most of the right characters exist already in 
arabic... but funnily enough the Yolngu, the Hungarians, and the 
Americans are pretty resistant to changing their characters. (Not so 
the Chinese, who revised their writing in the last half-century, nor 
the vietnamese, who used to have both a chinese and a latin-based 
writing system but are in the process of dropping the chinese 
script...). Eventually something will change - but it is hard to know 
in advance what. Especially since in this case the result could easily 
be that the languages just disappear, like Kamilrooy did. And Cornish 
did a few generations ago.

DVMSPIROSPERO

chaals

On Thursday, Oct 16, 2003, at 19:02 Europe/Zurich, Roberto Scano - 
IWA/HWG wrote:

>
> ----- Original Message -----
> From: "Patrizia Bertini" <patrizia@patriziabertini.it>
> To: <w3c-wai-gl@w3.org>
> Sent: Thursday, October 16, 2003 6:32 PM
> Subject: WCAG and some linguistics problems
>
> you shall know that in hungarian there are some peculiar graphical
> elements
> (phonemas) which are not rendered in the usual Ascii - iso 8859 used 
> for
> the
> Web. So many Hungarian pages are written changing this letters (which
> expecially are an o and an u with long Umlaut - in hungarian there era
> two
> kind of umlaut and meaning can get very different). there is a pretty
> easy
> example:
> tu'rň' - cottage cheese, written with long plain vowels
> and t"uro" - someone who in sopporting something - written with long
> umlaut
> vowels
>
>
> Roberto:
> Why don't use the right ISO language code as listed in W3C web site[1]?
> (iso-8859-2) ?
>
>
> Roberto Scano
> ---
> [1] http://www.w3.org/International/O-charset-lang.html
>
>
--
Charles McCathieNevile                          Fundación Sidar
charles@sidar.org                                http://www.sidar.org


--
Charles McCathieNevile                          Fundación Sidar
charles@sidar.org                                http://www.sidar.org
Received on Friday, 17 October 2003 20:55:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:47:26 GMT