- From: Charles McCathieNevile <charles@w3.org>
- Date: Wed, 13 Mar 2002 04:59:30 -0500 (EST)
- To: Jukka Korpela <jukka.korpela@tieke.fi>
- cc: <w3c-wai-ig@w3.org>
On Wed, 13 Mar 2002, Jukka Korpela wrote: Charles McCathieNevile wrote: > Simple examples include checking the vocabulary used against > relatively small vocabularies - - That's a good idea, but in general, this requires fairly complex technology, including morphological analysis. For English, the vocabularies could be made to contain all the inflected forms of words, or very simple morphology analysis (just recognizing suffixes like "-ed" and "-ing") could be applied. For most languages, it's much more difficult. I have done some of this for Yolngu-matha - a group of aboriginal languages that uses pre-, in- and post-fixing and stem changes a lot, and where so few people can spell the languages that everything needed to be spell-checked first. Programmatically it is not horrendously difficult - the main problem is just that there is a lot of data to handle. In the case I mention I had a dictionary of only about 10 000 words, that had variations across 31 cognate languages, about a dozen different verb groups, and an alphabet that I had never met before or since. The code was ugly (I learned some Java by doing this project...), but it mostly worked, and I am sorry to say that I don't know what happened to it. It also requires access to good grammatical information about the language, which luckily I had. But for most living languages and a large number of more-or-less dead ones this information exists (Kennedy's Revised Latin Primer, something I treasured for many years of schooling, is an example of this information, and there are many good dictionaries. Several of them should be collected for doing simple vocabulary testing, to build a thesaurus.) In principle, it would be possible to analyze the grammatic structures as well and to estimate readability according to the complexity of the grammar. This would be more advanced than just counting lengths of words and sentences. In fact, short but grammatically "loaded" expressions, with complex morphology and rarely used syntactic forms, can be much harder than a lengthy explanation in everyday language. Very true. Again, I think this problem is often tractable with brute-force techniques - having a lot of data for doing comparisons - something that the Web can be really helpful for. Especially the Semantic Web. There are some grammar checking and even grammar correction functions in some text processing systems. At present, one would need to use them separately, e.g. to open a copy of an HTML document in a sufficiently new version of MS Word and see what checks and analyses can be performed on the texts. ("A copy", since we don't want to have MS Word actually _change_ our HTML documents!) Right. But having those things already means the first prototypes are only a few months of programming time away. Anyone have a programmer in search of a project? > And for the real answer: > > Inclusion Europe - http://www.inclusion-europe.org - do test > their content for ease of reading Do I miss something? That site sounds very interesting, and they even have "Welcome" in different languages (some of those words even correctly written) that are links to general overviews of their mission, but where's the information to Web authors about how to make their pages more accessible and how to test accessibility? Sorry if sound sarcastic (I easily do), but I'm really disappointed. They have some information about how to write clearly - Graham sent a link to The WCAG list in his email at http://lists.w3.org/Archives/Public/w3c-wai-gl/2002JanMar/0464 but like most sites that one is a work in progress. cheers Chaals -- Charles McCathieNevile http://www.w3.org/People/Charles phone: +61 409 134 136 W3C Web Accessibility Initiative http://www.w3.org/WAI fax: +33 4 92 38 78 22 Location: 21 Mitchell street FOOTSCRAY Vic 3011, Australia (or W3C INRIA, Route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France)
Received on Wednesday, 13 March 2002 04:59:37 UTC