W3C home > Mailing lists > Public > w3c-wai-gl@w3.org > April to June 2005

Re: 3.1: Action item re foreign passages

From: Matt May <mcmay@w3.org>
Date: Tue, 07 Jun 2005 13:56:42 -0700
Message-ID: <42A60A0A.2050302@w3.org>
To: John M Slatin <john_slatin@austin.utexas.edu>
Cc: w3c-wai-gl@w3.org

John Slatin said:

>Of course, if it's a big transnational company that has all sorts of
>neural nets, etc., analyzing incoming mail, etc., it might well have the
>resources to do some automatic language recognition and then generate
>the appropriate markup.
I don't see evidence of that being feasible, and I don't think it's 
reasonable to expect those who host blogs, etc., to determine the 
language of whatever content comes in.

There are also issues with deeply bilingual cultures. For example, look 
at this blog, Vu d'ici/Seen From Here, based in Montreal:


The body of the message is equal parts English and French, and most of 
her comments are the same way. It changes from sentence to sentence. I 
find this to be common in Web sites in Montreal -- in fact, it's often 
the same in spoken language, with French, English, and "franglais" all 
mashed up in a single conversation.

Okay, the author could flip back and forth with <span 
lang="en|fr|fr-qc"> to mark up the main entry. But commenters would not, 
and often _could_ not, do the same thing. That's not a reflection of the 
site's overall accessibility: it's a reflection of the complexity of the 
problem. And who's to say that what's found here isn't just a step 
toward a creole? There's no single language code for the content of this 
blog entry. And there's no reliable mechanism with which a computer can 
discern -- or even ask -- what language is being used at any given moment.

Another complicating factor is Trackback, a system that notifies authors 
when someone has commented about their post on another blog. This is 
usually added to the comments of the originating blog. I've had times 
where Korean and Singaporean bloggers have left trackbacks (in their 
native languages) on my blog. I as the author have no control at the 
time this is posted over whether it appears on my blog, and often no 
idea how to mark it up, much less what it says. In practice, the 
presence of Trackback and comments is a guarantee that anyone can cause 
me to fail WCAG 2 at any time. I'm really nervous even implying that 
multilingual community sites are de facto inaccessible. To say that is 
to replace one cultural barrier with another.

Let's return to first principles. This is an accessibility issue because 
ATs need a signal to change speech engines when the languages change. 
This is especially important when engines try to parse scripts they have 
no clue about, and end up producing line noise. Fortunately, content 
that's encoded in Unicode at least indicates to ATs that they probably 
can't handle it (e.g., Asian, Indic, Cyrillic and Arabic scripts). And 
in those cases, that's where the burden should be. But across Latin 
scripts, this could still be a bigger problem than we can solve.

I propose leaving comments or other contributions from users of the site 
out of the scope of WCAG 2. Those are authored units that are out of 
control of the content producer, and unlike ad units, can't really be 
affected by site policy declarations.

In fact, I think it would be useful to refer to ATAG 2.0 for the 
accessibility of comment and community features, rather than building 
WCAG 2.0 around those contingencies. I believe ATAG 2 has the subject 
covered adequately, and it's where the subject belongs.

Received on Tuesday, 7 June 2005 20:56:47 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 21:07:40 UTC