Re: Checkpoint 3.1 - Identifying language (authoring burden) from Al Gilman on 2003-09-05 (w3c-wai-gl@w3.org from July to September 2003)

From: Al Gilman <asgilman@iamdigex.net>
Date: Fri, 05 Sep 2003 09:56:01 -0400
To: <w3c-wai-gl@w3.org>
Message-Id: <5.1.0.14.2.20030905093148.02044bf0@pop.iamdigex.net>
At 06:19 PM 2003-09-04, Y.P. Hoitink wrote:
>Meeting this success criteria is a heavy burden for people writing in
>languages which use a lot of foreign words.  To meet this success criteria,
>all [instances] of these [foreign words] would have to be labelled with 
>their original language.

An approach taken in the Voice Browser specifications to reduce this burden
is to allow a pronunciation lexicon in a separate resource.  Each
normally-mispronounced term can be listed once in the lexicon with its
correct pronunciation.  The formats containing text of prompts and "phrases
to catch when the user speaks them" allow reference to an external lexicon,
although there is not yet a W3C format for that lexicon.

PF is working with the Voice Browser group to help move the development of a
lexicon format specification along.

The point Yvette is making has also been raised emphatically in our group 
by others.

If these English words are not in the Dutch dictionary, however, people
needing help understanding the terms are at risk of being left in the lurch.
  Ideally the language information could be in a glossary or other lexicon as
well, avoiding the need to add tags to every use of these terms.

We could link to an external lexicon with the html:link element IF we get
the assistive technology and voice browser users of speech technology to
agree on some basic processing rules as to how the lexicon is applied, and
that the lexicon should be applied.

Al

  W3C "Voice Browser" Activity
  http://www.w3.org/Voice/


The ability to refer to a lexicon is already avail
At 06:19 PM 2003-09-04, Y.P. Hoitink wrote:

>Hello everyone,
>
>I have been going through the internal working draft of the WCAG 2.0
>guidelines <http://www.w3.org/WAI/GL/WCAG20/> and came across core
>checkpoint 3.1 (Language of content can be programmatically determined).
>
>The success criteria for checkpoint 3.1 reads:
>"passages or fragments of text occurring within the content that are written
>in a language other than the primary natural language of the content as a
>whole, are identified, including specification of the language of the
>passage or fragment. "
>
>Meeting this success criteria is a heavy burden for people writing in
>languages which use a lot of foreign words. For example, my native language
>Dutch uses a lot of English words and phrases , like "website", "software
>engineering", "content management", "on the job training" etc. On a typical
>Dutch page, especially on a technical or management subject, several
>percents of the words can easily be foreign. To meet this success criteria,
>all of these would have to be labelled with their original language.
>
>This isn't a problem in English so much, since there aren't so many foreign
>words. I guess this is why we hear the "je ne sais quoi" example all the
>time :-) But we're writing these guidelines for the entire internet, not
>just the English speaking community.
>
>I understand that labelling foreign phrases with their language would
>increase accessibility. If English words are spoken out loud with Dutch
>pronounciation by speech synthesizers, they are hard to understand because
>the words sound quite different.
>
>But it is quite a lot to ask to label everything for people writing in
>languages which use a lot of foreign words. Also, even though they are
>harder to understand, the phrases can still be heard so it's not _totally_
>inaccessible. I don't think this issue should be on the same level of
>requirements as, for example, the need for text equivalents for non-text
>content.
>
>What I'm trying to say is this: Shouldn't this be a best practice instead of
>a required success criteria?
>
>On the other hand, I was surprised to find that the identification of the
>_main_ language of the document is only a best practice, and not a success
>criteria. So we are requiring to label every foreign phrase, but we don't
>even know what the main language is? Couldn't this lead to the unwanted
>situation that 99% of the document is read with the wrong pronounciation,
>except for the few foreign words whose language _was_ identified?
>
>I have been trying to find past discussions about this in the mail archives
>but haven't seen these issues. If I missed them, I would appreciate it if
>someone could send me the links.
>
>Yvette Hoitink
Received on Friday, 5 September 2003 09:56:11 UTC