- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Thu, 30 Jan 2003 02:45:56 +0200 (EET)
- To: w3c-wai-ig@w3.org
On Wed, 29 Jan 2003, Marjolein Katsma wrote: > My aim is to mark up languages correctly That's a noble goal, and I personally try to use language markup within reasonable limits, but I think it needs to be said that current browser or other support is _very_ limited. That is, you shouldn't expect much practical gain now or in the near future. The only _accessibility_ impact that I know of is that IBM Home Page Reader recognizes lang attributes and can switch its reading mode accordingly - for a few languages. It's very nice when it works, but it really applies to a rather small set of browsing situations. > Most of the site will be bilingual (with mostly separate English and > Dutch pages), but other languages pop up a lot in the text, too, since > it's a travel journal. As a rule, bilingual or multilingual _pages_ should be avoided. _Sites_ should have each page in one language. There are several reasons to this. For one thing, it is mildly disturbing to anyone who knows just one of the two languages to read a bilingual page, even if it has the same information in both languages (and if it does, why not make it two pages). And anything that is mildly disturbing to a "normal" person could be seriously disturbing to some people. - Of course, there are some good reasons to use several languages on one page. I'm just trying to point out that some of the reasons are not that good. > In most cases I'm > not marking up place names and person names unless I'm sure it's a > direct transliteration of the actual name in the local/native language. I tend to agree, mainly on practical grounds. There are theoretical problems too, which I won't go into, but it would often mean a lot of practical work. And often we just don't know the language. I wouldn't know what language markup to use for your name, for example! > But what language is an English transliteration of a Russian > transliteration (or version!) of an Uzbek name? At the theoretical level, transliteration is between writing system, not languages. Thus, the Russian name for Moscow is still in Russian when transliterated into Latin alphabet, Moskva. But when a name is adapted into another language, so that the pronunciation and/or spelling is clearly changed, it would be adequate to consider this as a language change. Thus, I would use <span lang="en">Moscow</span>, <span lang="ru">Moskva</span>, <span lang="fi">Moskova</span>, etc. On the practical side, browsers and other software will understand such issues even less than lang markup in general. There's actually no method in markup for indicating the _writing system_ (such as Latin vs. Cyrillic), though apparently it would often be easy to guess from the language and the repertoire of characters used. > My basic rule here is: when in doubt, don't. Mine too - I have formulated it as "when in doubt, leave it out". > For one class of names I do have a real problem though: how does one > mark up scientific names for plants, birds, animals, etc? It's certainly > a kind of language (though not _really_ Latin - so although there is an > ISO language code for Latin (la) I cannot use that, I think). I think lang="la" is the only feasible option if you wish to use language markup here. The scientific names are all in Latinized form. Even though a large part of the vocabulary is of Greek origin, it's adapted to Latin grammar to some minimal amount. The pronunciation should obey Latin rules in principle. One might ask _which_ Latin. The Latin language exists in several variants especially as regards to pronunciation (so that "Citellus" could be pronounced with an initial "s" sound, or an initial "k" sound, or an initial "ch" sound, or maybe an initial "ts" sound). But there are no subcodes registered for Latin. Moreover, even if there were, it would probably be better for accessibility reasons to use just "la". Assuming that some browser starts pronouncing Latin, it should probably do that in a manner that corresponds to the user's idea of Latin pronunciation rather than the author's. > For now, I've chosen to mark up as in this example: <span > class="sci">Citellus fulvus</span> (that's a Yellow Souslik, in case > you're interested), with my stylesheet taking care of properly > italicizing such names (as required by the rules for scientific names). I agree with Nick's suggestion to use <i> rather than <span> here. After all, we very much like to have the names italicized; this should not depend on style sheets. We would like to have structured markup like <taxon>, but we haven't. Using <i> is the best shot. But I wouldn't call it really _semantic_. (And I'd use class="taxon", but that's fairly irrelevant - it's just a name, except that it might evolve into some kind of convention that might be marginally useful.) > Another problem can be hreflang for the target language of a page I'm > linking to: what to use if *that* is a multilingual page? The theoretical answer is hreflang="mul". The "mul" code is the ISO 639-2 code for 'multiple languages': "The language code mul (for multiple languages) should be applied when several languages are used and it is not practical to specify all the appropriate language codes." And by HTML definition, we _cannot_ specify more than one language code in an hreflang attribute. (This might be an oversight. The HTTP header Content-Language, which is what hreflang logically corresponds to, allows for a list of language codes.) In practical terms, there's hardly any browser that uses hreflang in any way. Well, it could be used in an attribute selector in CSS, but that's not very relevant here. > Things like alt text and table summaries on a multilingual page can be > fun, too. When there are only two languages on a page, you could just > use both - but what if there are many more? If you ask me, they should use the language of the content of the element. Of course, a table might contain several languages. But it's "user interface", like headings, are probably in one language only. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 29 January 2003 19:45:58 UTC