- From: Charles McCathieNevile <charles@sidar.org>
- Date: Mon, 5 Apr 2004 00:26:27 +1000
- To: "Jesper Tverskov" <jesper.tverskov@mail.tele.dk>
- Cc: <w3c-wai-ig@w3.org>
Hi Jesper, On 4 Apr 2004, at 21:25, Jesper Tverskov wrote: > Let us take Google as example. It returns search results in many > different languages on the same page, and the result page uses > Unicode. > > At the moment change of natural language is not included in the > mark-up. Since the user can choose to get results in a particular > language only it would probably be possible for Google to indicate > change of natural language automatically even when many languages are > used in the same page and the page is generated from many different > language sources. Google tries to pick the language of content. Perhaps it mixes guessing algorithms with reading any declarations, but since those kind of details tend to be Google's basic trade secrets we are ourselves just guessing. > It is probably less realistic to expect smaller or ordinary websites > and web services to be able to include mark-up for change of natural > language when documents are generated on the run from many language > sources including interaction with users, like commentary and debate, > etc. Having written assorted multilingual documents I don't think it is a big imposition to mark actual changes. One of the interesting things about doing so is that it helps to re-use that information when generating new content by compiling existing work. The Google example you give is interesting because if the information is included by authors then aggregators along those lines can get it and present it, rather than implementing a guessing algorithm. > Now consider a modern word processor like MS Word. Even if 10 > different languages are used in 10 paragraphs on the same page, the > spell checker has no problem identifying the change of > natural language and to apply the right dictionary for each paragraph. > No indication of change of natural language is needed by the author. With all due respect to MS Word it is a monstrously large program. It also includes a very functional databasing system, graphics package, spreadsheet functionality, interpreters for programming languages, a large collection of conversion tools, and so on. It is important, I think, to have tools that do the guessing algorithms as one way of determining what language is being used. But as was discussed in Authoring Tool Guidelines work some years ago, the best use of this is in authoring setups, to insert the information having checked it with the author. > Maybe it is more realistic in many situations to leave indication of > change in natural language to user agents than to expect web page > authors to do the job. Web page authors should probably still indicate > change of natural language in web content made by themselves, but it > is probably much more convenient and realistic to leave this task > to user agents for many types of generated content. Why not leave the > job of indicating change of natural language to a handful of user > agents and save millions of web page authors for a lot of work? I think it is only going to save a little work, and at the expense of a great deal of potentially useful information. If the language information does get included into the Web itself (rather than being guessed at by user agents) then bit by bit it should become more useful for both comparing material as a process of guessing, and providing material that can be used to train guessing systems. This kind of approach is the idea behind the semantic web. RDF's capability for anyone to provide information about another resource can be helpful here, as it is for the work that Lisa Seeman does, allowing for third parties to provide the necessary information (either manually or because they bought a better tool than the author and can automate more of the work). > The above is just one example of problems or challenges for > accessibility arising from or made more common by the use of Unicode. > I would like to hear of other cases, and if it is more realistic in > many situations to leave detection of change in natural language to > user agents. I'm not sure what other issues arise from using Unicode. cheers Chaals -- Charles McCathieNevile Fundación Sidar charles@sidar.org http://www.sidar.org
Received on Sunday, 4 April 2004 10:29:22 UTC