RE: [Techniques] Draft General Technique for GL 3.1 L2 SC3 from Gregg Vanderheiden on 2004-12-29 (w3c-wai-gl@w3.org from October to December 2004)

From: Gregg Vanderheiden <gv@trace.wisc.edu>
Date: Wed, 29 Dec 2004 16:22:09 -0600
To: <michele@diodati.org>, <w3c-wai-gl@w3.org>
Message-ID: <auto-000200375017@spamarrest.com>
Hi Michele,

1)  I agree there are pronunciation problems with AT.   If we can find good
practical methods for addressing this we will.  It currently isnt required
at level 2.  if you have ideas how to do this in a practical way - we would
love to hear them. 

2)  Marking passages.   The author should know about the language of
passages they insert.   The comment on post author web masters having to
mark up things is a good point.   We are looking at the issue of aggregation
in general.     There are also tools that will auto recognize languages I
believe. 



Your other points

1. In many cases, pronunciation is _fundamental_ for a right understanding
of the content (and therefore for its accessibility).

GV Agree.  As above.

2. The present separation in L2 SC3, between words included and not included
in dictionaries, does not give a valid solution for a lot of situations
arising from intrinsic ambuiguity and complexity of natural languages.

GV  Not sure I follow.  The rule in #1 is that a dictionary be attached for
all words in the content.   And definitions be created for custom words that
are not in any dictionary.  

3. I am not suggesting to modify the SC in a way that every foreign word,
included or not included in dictionaries, should be marked in the code with
the appropriate lang code. I am rather suggesting that _no passage or
phrase_ written in a foreign language should be marked in the code. When
possible, authors should resolve potential ambiguities in the content
modifying the wording or linking some words to glossaries and dictionaries.

GV  ok  This is one that we have been thinking about a lot.  No resolution
yet. 

4. The identification of the natural language of each block of text in a web
page should be delegated to user agents.

GV  For larger blocks of text I think this may be workable.  Little phrases
could be harder.   I would like to see this handled by User agents.  If they
can (without markup) then the language of the words would be
"programmatically determined" without needing markup.   Do you know of tools
that can do this?  Esp if they are publicly available. 

5. The Working Group should do every effort to obtain feedbacks about what
is now in L2 SC3 for Guideline 3.1 from people speaking languages very
different from English. 

GV  We have members in the working group from many different countries for
just that reason.   And always looking for more input. This is a difficult
area and the different languages make it even more so.



 
Gregg

 -- ------------------------------ 
Gregg C Vanderheiden Ph.D. 
Professor - Ind. Engr. & BioMed Engr.
Director - Trace R & D Center 
University of Wisconsin-Madison 


-----Original Message-----
From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On Behalf
Of Michele Diodati
Sent: Wednesday, December 29, 2004 11:10 AM
To: w3c-wai-gl@w3.org
Subject: Re: [Techniques] Draft General Technique for GL 3.1 L2 SC3


Hi Gregg,

thank you for your answer, but it still seems to me there is something wrong
in the L2 SCs requirements for GL 3.1. You wrote:

> (...) The origin of a word is not relevant.
> 
> If you read the SC it says that foreign words that are normal in text 
> (in the dictionary) are not subject to the rule.  (...)
> 
> It was only meant to refer to phrases made up of words that are not in 
> an unabridged dictionary. Specifically the dictionary linked to in L1 SCs.

Such a constraint assumes implicitly that there are not accessibility issues
arising from the pronunciation of foreign words, included in a dictionary as
their standard extensions (for example, the word "file"
in Italian dictionaries). On the contrary, I think this is a relevant
accessibility issue: if a listener can't determine the meaning of a word
because his speech synthesizer is not able to pronounce it in a
comprehensible manner, the accessibility of the content will appear
diminished. I think it is totally irrelevant if the mispronounced word is,
or is not, included in an unabridged dictionary of the main language used in
the document.

And if we assume that assistive technologies are the only responsible for
the right pronunciation of all the words included in endorsed dictionaries,
why ever should we mark esplicitly a passage written in a foreign language?
An assistive technology, intelligent enough to pronounce in a comprehensible
manner all the foreign words included in a given dictionary, all the more
reason should be able to recognize the natural language of every block in a
web page, automatically switching itself to the requested speech engine.

> No linguistic knowledge is required.

If a web developer has to mark every passage written in a foreign language
with the appropriate lang code, it seems to me that he or she has to know at
least which language every single piece of text in a web page is written
into. If a web developer isn't the author of the content but only its
"aggregator", and if the content is very long, it can be a great problem
assigning every phrase or passage to the appropriate natural language.

> The comments below also seem to indicate that there is an assumption 
> of absolute ability to pronounce properly. This is not the case. In 
> fact you can't even get people in the US to agree on how to pronounce
English words.
> The goal is to have an idea of possible pronunciations.  And to know 
> when a phrase is a foreign quote or passage.

Probably everyone in US is able to understand English phrases pronounced
with a British pronunciation, and vice versa. It is not the same for English
words and phrases interspersed in a document written for the main part in
Italian. In such a case, it may appear paradoxical, but the best
understanding of the content derives from an adapted pronunciation of the
English passages more than from a perfect English pronunciation.

In brief, I want to underline the following points:

1. In many cases, pronunciation is _fundamental_ for a right understanding
of the content (and therefore for its accessibility).

2. The present separation in L2 SC3, between words included and not included
in dictionaries, does not give a valid solution for a lot of situations
arising from intrinsic ambuiguity and complexity of natural languages.

3. I am not suggesting to modify the SC in a way that every foreign word,
included or not included in dictionaries, should be marked in the code with
the appropriate lang code. I am rather suggesting that _no passage or
phrase_ written in a foreign language should be marked in the code. When
possible, authors should resolve potential ambiguities in the content
modifying the wording or linking some words to glossaries and dictionaries.

4. The identification of the natural language of each block of text in a web
page should be delegated to user agents.

5. The Working Group should do every effort to obtain feedbacks about what
is now in L2 SC3 for Guideline 3.1 from people speaking languages very
different from English.

Best regards,
Michele Diodati
--
http://www.diodati.org
Received on Wednesday, 29 December 2004 22:22:17 UTC