RE: Checkpoint 3.1 - Identifying language from Michael Cooper on 2003-09-05 (w3c-wai-gl@w3.org from July to September 2003)

From: Michael Cooper <michaelc@watchfire.com>
Date: Fri, 5 Sep 2003 10:37:02 -0400
To: w3c-wai-gl@w3.org
Message-ID: <D9ABD8212AFB094C855045AD80FB40DD017E0B56@1WFMAIL>
First to respond to the question about why identifying the primary language
of the document has less emphasis than identifying changes: I've wondered
about that myself a lot. On a Thursday call sometime in the past couple
months the explanation was given that, if your screenreader doesn't pick up
the primary language it will just start reading gibberish, and you'll know
something is wrong and do something about it, like leave the page or change
your language settings; but if most of the page is reading correctly and
suddenly you encounter a gibberish phrase, you know you're missing content
on a page that was otherwise accessible to you. However, on a Wednesday call
a about a month ago, Richard Ishida of the Internationalization group said
he considered identifying the primary language equally important. He was
going to send a review note about this so it will be put on the agenda for
discussion at some point.

As far as the prevalence of non-Dutch (e.g., English) words in Dutch, like
"website", here is an opinion which may or may not be considered accurate by
people with linguistic backgrounds, but it's something to throw out there. I
would propose that the incorporation of words like "website" into Dutch is
simply an example of the borrowing of words between languages that happens
all the time. I've studied French, German, and Latin and been amazed by how
many words from those languages are found in English - sometimes retaining
their original meaning, sometimes with a related but different meaning. Yet
we consider them English words now, because they've been part of the
language long enough that English speakers recognize them. The only real
difference with your example is recency - words like "website" were imported
during the lifetime of current speakers, so they may view them as foreign
words - but the meaning of them is understood all the same. I would say that
for all practical purposes those words can be considered part of the Dutch
language, since in a few generations people will no longer see them as
non-Dutch anyway. Under this logic, such words would be exempt from the
requirement to be marked up specially, since they are not 'foreign' words in
the sense that you have to have an understanding of the foreign language to
understand the phrase.

The question of pronunciation still exists and may nullify my argument. I
don't know how "website" is pronounced in Dutch - is it pronounced the way
it is in English, or is it pronounced the way a screenreader might applying
Dutch pronunciation rules? If the screenreader pronounces it one way but the
common pronunciation is different, there might still be a problem. Yet I
would say first that there are plenty of words screenreaders mispronounce,
because of homonyms, regional accents, etc., and people deal with it.
Second, if by my logic above the word is effectively part of the Dutch
language, the screenreader ideally should come already designed with the
ability to pronounce it correctly, if only via an exceptions dictionary.
This is similar to the argument that only words that are not in an
unabridged dictionary for the language fall under this rule. The dictionary
authors may not have caught up to common practice, but it's common practice
that is important for people's ability to access the content. Of course that
makes testability more difficult...

Michael

> -----Original Message-----
> From: Y.P. Hoitink [mailto:y.p.hoitink@heritas.nl]
> Sent: Thursday, September 04, 2003 6:22 PM
> To: w3c-wai-gl@w3.org
> Subject: Checkpoint 3.1 - Identifying language
> 
> 
> 
> Hello everyone,
> 
> I have been going through the internal working draft of the WCAG 2.0
> guidelines <http://www.w3.org/WAI/GL/WCAG20/> and came across core
> checkpoint 3.1 (Language of content can be programmatically 
> determined).
> 
> The success criteria for checkpoint 3.1 reads:
> "passages or fragments of text occurring within the content 
> that are written
> in a language other than the primary natural language of the 
> content as a
> whole, are identified, including specification of the language of the
> passage or fragment. "
> 
> Meeting this success criteria is a heavy burden for people writing in
> languages which use a lot of foreign words. For example, my 
> native language
> Dutch uses a lot of English words and phrases , like 
> "website", "software
> engineering", "content management", "on the job training" 
> etc. On a typical
> Dutch page, especially on a technical or management subject, several
> percents of the words can easily be foreign. To meet this 
> success criteria,
> all of these would have to be labelled with their original language.
> 
> This isn't a problem in English so much, since there aren't 
> so many foreign
> words. I guess this is why we hear the "je ne sais quoi" 
> example all the
> time :-) But we're writing these guidelines for the entire 
> internet, not
> just the English speaking community.
> 
> I understand that labelling foreign phrases with their language would
> increase accessibility. If English words are spoken out loud 
> with Dutch
> pronounciation by speech synthesizers, they are hard to 
> understand because
> the words sound quite different. 
> 
> But it is quite a lot to ask to label everything for people writing in
> languages which use a lot of foreign words. Also, even though they are
> harder to understand, the phrases can still be heard so it's 
> not _totally_
> inaccessible. I don't think this issue should be on the same level of
> requirements as, for example, the need for text equivalents 
> for non-text
> content.
> 
> What I'm trying to say is this: Shouldn't this be a best 
> practice instead of
> a required success criteria?
> 
> On the other hand, I was surprised to find that the 
> identification of the
> _main_ language of the document is only a best practice, and 
> not a success
> criteria. So we are requiring to label every foreign phrase, 
> but we don't
> even know what the main language is? Couldn't this lead to 
> the unwanted
> situation that 99% of the document is read with the wrong 
> pronounciation,
> except for the few foreign words whose language _was_ identified? 
> 
> I have been trying to find past discussions about this in the 
> mail archives
> but haven't seen these issues. If I missed them, I would 
> appreciate it if
> someone could send me the links.
> 
> Yvette Hoitink
> 
>
Received on Friday, 5 September 2003 10:42:41 UTC