Re: Unicode and accessibility from Charles McCathieNevile on 2004-04-04 (w3c-wai-ig@w3.org from April to June 2004)

From: Charles McCathieNevile <charles@sidar.org>
Date: Mon, 5 Apr 2004 00:26:27 +1000
To: "Jesper Tverskov" <jesper.tverskov@mail.tele.dk>
Cc: <w3c-wai-ig@w3.org>
Message-Id: <0EF1BF2B-8644-11D8-A043-000A958826AA@sidar.org>
Hi Jesper,

On 4 Apr 2004, at 21:25, Jesper Tverskov wrote:

> Let us take Google as example. It returns search results in many 
> different languages on the same page, and the result page uses 
> Unicode.
>  
> At the moment change of natural language is not included in the 
> mark-up. Since the user can choose to get results in a particular 
> language only it would probably be possible for Google to indicate 
> change of natural language automatically even when many languages are 
> used in the same page and the page is generated from many different 
> language sources.

Google tries to pick the language of content. Perhaps it mixes guessing 
algorithms with reading any declarations, but since those kind of 
details tend to be Google's basic trade secrets we are ourselves just 
guessing.

> It is probably less realistic to expect smaller or ordinary websites 
> and web services to be able to include mark-up for change of natural 
> language when documents are generated on the run from many language 
> sources including interaction with users, like commentary and debate, 
> etc.

Having written assorted multilingual documents I don't think it is a 
big imposition to mark actual changes. One of the interesting things 
about doing so is that it helps to re-use that information when 
generating new content by compiling existing work. The Google example 
you give is interesting because if the information is included by 
authors then aggregators along those lines can get it and present it, 
rather than implementing a guessing algorithm.

> Now consider a modern word processor like MS Word. Even if 10 
> different languages are used in 10 paragraphs on the same page, the 
> spell checker has no problem identifying the change of 
> natural language and to apply the right dictionary for each paragraph. 
> No indication of change of natural language is needed by the author.

With all due respect to MS Word it is a monstrously large program. It 
also includes a very functional databasing system, graphics package, 
spreadsheet functionality, interpreters for programming languages, a 
large collection of conversion tools, and so on.

It is important, I think, to have tools that do the guessing algorithms 
as one way of determining what language is being used. But as was 
discussed in Authoring Tool Guidelines work some years ago, the best 
use of this is in authoring setups, to insert the information having 
checked it with the author.

> Maybe it is more realistic in many situations to leave indication of 
> change in natural language to user agents than to expect web page 
> authors to do the job. Web page authors should probably still indicate 
> change of natural language in web content made by themselves, but it 
> is probably much more convenient and realistic to leave this task 
> to user agents for many types of generated content. Why not leave the 
> job of indicating change of natural language to a handful of user 
> agents and save millions of web page authors for a lot of work?

I think it is only going to save a little work, and at the expense of a 
great deal of potentially useful information. If the language 
information does get included into the Web itself (rather than being 
guessed at by user agents) then bit by bit it should become more useful 
for both comparing material as a process of guessing, and providing 
material that can be used to train guessing systems.

This kind of approach is the idea behind the semantic web. RDF's 
capability for anyone to provide information about another resource can 
be helpful here, as it is for the work that Lisa Seeman does, allowing 
for third parties to provide the necessary information (either manually 
or because they bought a better tool than the author and can automate 
more of the work).

> The above is just one example of problems or challenges for 
> accessibility arising from or made more common by the use of Unicode. 
> I would like to hear of other cases, and if it is more realistic in 
> many situations to leave detection of change in natural language to 
> user agents.

I'm not sure what other issues arise from using Unicode.

cheers

Chaals

--
Charles McCathieNevile                          Fundación Sidar
charles@sidar.org                                http://www.sidar.org
Received on Sunday, 4 April 2004 10:29:22 UTC