[Techniques] Draft General Technique for GL 3.1 L2 SC3 from Michele Diodati on 2004-12-28 (w3c-wai-gl@w3.org from October to December 2004)

From: Michele Diodati <michele.diodati@gmail.com>
Date: Wed, 29 Dec 2004 00:09:07 +0100
To: w3c-wai-gl@w3.org
Message-ID: <2e1e87c041228150972f12821@mail.gmail.com>
On Mon, 27 Dec 2004 17:55:44 -0600, John M Slatin
<john_slatin@austin.utexas.edu> wrote:
>  
> Many documents contain words, phrases, or longer passages that are 
> in a different language than the language of the document as a whole. 
> The language of each "foreign" word, phrase, or longer passage must 
> be identified so that user agents, including assistive technology, can 
> present the text appropriately. 

I think this requirement is highly inapplicable for many reasons.

1. Web developers are not linguists.

2. Web developers very often aren't authors of the text in the web
pages they publish on the Web, but they simply put together contents
received from different sources.

3. Web developers can completely ignore which natural language a run
of text, even a single word, is.

4. Natural languages are very complex phenomena: by no means they are
reducible to a set of mathematical equations, in which you can say
always, surely and perfectly "this is only English", "this is only
Italian", "this is only Hebrew", etc. etc.

5. Even though you can specify unmistakebly the natural language of a
run of text, it could be even worse for accessibility.

Here are some examples, taken from the Italian language, of that
complexity and of the ambiguous consequences of Guideline 3.1 L2 SC.

a. The note at the bottom of Guideline 3.1 L2 SC3 currently says:
"This does not include use of foreign words in text where such usage
is a standard extension of the language". There are many words
included in Italian vocabularies and dictionaries that are foreign
words largely used. "File", taken from English, is one of these words.
According to the above note, Italian web developers should not mark
the word "file" as an English word, because it is a standard extension
of the Italian language (if I understand what "a standard extension"
means). However, "file" (=document) is homograph of "file" (=rows,
lines). The latter is pronounced according to the classic Italian
phonetic rules, while the former is pronounced according to rules
trying to simulate the English pronunciation of the word "file". Which
way can I make this word correctly pronounced by a speech synthesizer,
since it is "a standard extension" of the Italian language (a speech
synthesizer will pronounce "file" according to traditional phonetic
rules)?

b. Every natural language has phonetic rules of his own. Many words
and phrases written in English (or in French) can be understood from
Italian listeners _only if_ they are pronounced according to phonetic
rules different from traditional phonetic Italian rules, but often
also _very different_ from English phonetic rules. It is indeed a
third language, different both from Italian and from English. If we
mark these words and short sentences as foreign text, i.e. as English,
a compliant speech synthesizer will pronounce them in such a way an
Italian listener will be very likely not able to understand. I think
only assistive technologies can improve accessibility in similar
situations. The requirement in Guideline 3.1 L2 SC3 could be a remedy
worse than the disease. Another not secondary issue: the absolute
majority of Italian authors and web developers are totally unaware of
the difference between actual foreign pronunciation of many foreign
words used in Italian, and the adapted pronunciation of the same
words, used from Italian mother tongue speakers. When they use in
their web pages foreign words (it happens very often in technical
writings), they are really thinking at the Italian, adapted
pronunciation of those words.

c. Some proper nouns are adapted transcriptions in latin characters
from foreign alphabets. These nouns become meaningful for an Italian
listener, only if they are pronounced according to phonetic rules
different from typical Italian phonetic rules. For example, "Sharon"
and "Shimon Peres", names of very famous Israeli politicians, contains
the group "sh", a phoneme not used in Italian. How a web developer
should mark those names? They are clearly not Italian words. But they
are not even Hebrew words. They are rather adaptations of foreign
words in latin characters: a strange mixture, for which it isn't clear
whether or not we need a rule for improving their accessibility. My
opinion is we don't need such a rule. Assistive technologies have to
incorporate for each natural language always larger dictionaries of
adapted foreign pronunciation: this is the only way in which foreign,
isolated words can be made understandable.

In conclusion, I think Level 2 Success Criteria for Guideline 3.1
should be removed, at least for pronunciations. It is inconceivable
that Web developers can manage such an ambiguous and complex matter.
It would be much more useful and pragmatic if the task of managing
changes in the natural language of the content were completely
delegated to assistive technologies.

Hoping this can help.

Michele Diodati
----------------------------------
http://www.diodati.org
----------------------------------
Received on Tuesday, 28 December 2004 23:09:39 UTC