- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 25 Apr 2008 09:12:11 +0300
- To: "WebAIM Discussion List" <webaim-forum@list.webaim.org>, <gawds_discuss@yahoogroups.com>, <w3c-wai-ig@w3.org>
John Foliot - Stanford Online Accessibility Program wrote: > As far as I know, current screen reading technology only supports a > limited number of languages. Rather limited, I'm afraid. Moreover, support to language switching on the basis of language markup (lang or xml:lang attributes) is much more limited. In practical terms, using language markup at the top level (<html> or <body> element) is a good move: it takes a very small effort, and it helps some people. (But then it should be _correct_. It often isn't, so e.g. Google does not use the information.) Using language markup at other markup levels, e.g. for individual paragraphs or even words, is rather pointless, sad to say. There isn't much support worth mentioning. (I use it, but mostly as a matter of principle, or habit, and not very consistently. Many W3C pages, including pages that declare that it should be used, don't use it. Most web pages don't even make a try, so what motivation is there for software developers to support it?) That's the big picture. In details, there's a lot that could be said, especially about the problems, but this doesn't seem to be an interesting topic to most people. However, mostly for "academic" interest, I'll comment on your specific issues: > I am in the process of reviewing a number of web documents that > feature, in part, a fair bit of "old Latin" (circa 13th century - > it's a cool academic project). I took "old" Latin as referring to pre-classic Latin... Anyway, there's no useful standardized way to distinguish between different forms of Latin in language codes. You could use country codes, e.g. "la-GB" to refer to Latin as used in the United Kingdom, but this would be anachronistic for 13th century language and also useless. > At any rate, W3C guidance states > "Clearly identify changes in the natural language of a document's > text and any text equivalents (e.g., captions)." I'm afraid nobody, including the W3C, takes that seriously. It's just too much trouble with little if any tangible benefit. It's based on theoretical ideas - largely, law, poorly analyzed ideas - on the _possible_ usefuless of language markup, rather than actual experience. > *AND* the ISO code > for Latin is either "LA" (ISO 639-1) or "LAT" (ISO 639-2) so clearly > this *CAN* be done. The technically correct language code for use in markup is "la", with lowercase as the recommended spelling. HTML and XML specifications refer to specifications that mandate the use of two-letter codes for languages that have one. > As well, wikipedia suggests that "Screen readers without Unicode > support will read a character outside Latin-1 as a question mark, Character support is a different issue and should not depend on language markup, and mostly doesn't. Generally, in special software like screen readers or specialized browsers, we should expect character support to be more restricted than in common modern browsers. Even Latin-1 isn't as safe as in "normal" browsing. For example, what would a screen reader do upon encountering a special character like " ¶"? Would it recognize it as having a special meaning (paragraph separator) and make a pause? Hardly. It probably spells it out. This might mean saying "pilcrow sign", perhaps independently of language being used (since characters names aren't widely localized - most characters don't even _have_ a name in most languages), which might be complete gibberish even to people who understand normal English. > The question is, is there any real advantage gained by adding this > information (lang="lat") to the content? Very little if at all. But if used, it should be lang="la". > I am at a loss to explain any real value > in doing it to the client as at the end of the day I cannot myself > find a "real justification" that would improve the accessibility of > the document. The best explanation that I could use (if someone offered to pay me for adding such markup and I needed to soup up "internal" and "moral" motivation) is the following (and it's lame, so this tells a lot): If a user opens your HTML page in a word processor like Microsoft Word, it will use the language markup, and this can be relevant when spelling checks are "on", i.e. words classified as misspelled are highlighted. Declaring Latin words as Latin prevents the program from applying English spelling rules to them. (The copy of Word I just tested seems to be Latin-ignorant. That is, it recognizes the words being in Latin but does not flag anything as misspelled and does not even hyphenate Latin words. But even this is probably better than treating them as English or some other language.) On some browsers, like Firefox, the user can right-click on a word and get information about its language. Sometimes it is useful to know that a word is Latin. (But what are the odds that a user knows about such functionality?) Style sheets, either page or user style sheets, could be used to style words in a particular language as different from others, using a selector like [lang="la"] or :lang(la). However, this does not work e.g. on IE 6, which does not recognize such selectors. Moreover, some day some browsers or other software could make real use of the markup. Jukka K. Korpela ("Yucca") http://www.cs.tut.fi/~jkorpela/
Received on Friday, 25 April 2008 06:12:45 UTC