RE: Test files review from Richard Ishida on 2005-02-16 (public-i18n-core@w3.org from January to March 2005)

From: Richard Ishida <ishida@w3.org>
Date: Wed, 16 Feb 2005 12:02:17 -0000
To: "'Lisa Seeman'" <lisa@ubaccess.com>, "'Michael Cooper'" <michaelc@watchfire.com>, <w3c-wai-gl@w3.org>
Cc: <public-i18n-core@w3.org>
Message-Id: <20050216120216.CE9BE4EFD5@homer.w3.org>

I think i18n ought to be involved in discussions about this type of thing.
We should also produce some findable document that summarises the
conclusions, since the same comments seem to be made over and over. I think
that it is a harder question than it looks, and may yield some thorny
problems. 

Here are some quick observations on this mail:

> From: w3c-wai-gl-request@w3.org 
> [mailto:w3c-wai-gl-request@w3.org] On Behalf Of Lisa Seeman
> Sent: 16 February 2005 11:25
> To: Michael Cooper; w3c-wai-gl@w3.org
> Subject: Re: Test files review

extract: 
> Test :  Words not in the document's primary language must be 
> identified. [http://www.w3.org/WAI/GL/WCAG20/tests/test110.html]
> 
> Side question -why do we not ask to translate it too -that 
> would be very useful 

> Now let us say we have two languages in 
> a page with a different alphabet - say we are using Hebrew 
> (surprise surprise) As the quote is then clearly in a 
> different range, and Hebrew and Yiddish are the only 
> languages using this alphabet, and as the switch from Yiddish 
> to Hebrew is unlikely to be either easy to determine, or an 
> accessibility issue, why do we need this -In other words, the 
> meaning is still programmatically identifiable
>  
> However the same is not true for a Hebrew without 
> vocalization, or , in some case a Hebrew with English inside it.
> however -could you just identify a Lang for an Unicode range, 
> and then all will be well. 
>  
> Side note: from a programmatic perspective, could the je ne 
> sais quoi not be easly identified as French without the Lang tag?


This approach doesn't scale at all.  With scripts such as Cyrillic, Arabic,
Latin in other scripts, Latin in Latin, Devanagari, etc. many often very
different languages can be represented using the same script.

On the other hand, I wonder whether it is necessary to always tag language.
Here are some possible scenarios:

1. some embedded romaji text in Japanese - some words may be pronounced in a
Japanese way, or pronounced letter-by-letter - on the other hand, a phrase
or word that wouldn't be recognised in a Japanese context probably should be
tagged - this concept may be extendable to other languages/scripts

2. words or phrases such as 'je ne sais quoi' - whether this is considered
French or not may depend on whether it appears in English dictionaries (the
standard dictionaries or those of, say, a specific voice browser).

Obviously, it is difficult to automatically test for such situations.

Received on Wednesday, 16 February 2005 12:02:21 UTC