W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > January to March 2003

Re: Markup of scientific (biological) names, linking to multilingual pages, etc.

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Thu, 30 Jan 2003 02:45:56 +0200 (EET)
To: w3c-wai-ig@w3.org
Message-ID: <Pine.GSO.4.50.0301300206230.4191-100000@korppi.cs.tut.fi>

On Wed, 29 Jan 2003, Marjolein Katsma wrote:

> My aim is to mark up languages correctly

That's a noble goal, and I personally try to use language markup within
reasonable limits, but I think it needs to be said that current browser or
other support is _very_ limited. That is, you shouldn't expect much
practical gain now or in the near future. The only _accessibility_ impact
that I know of is that IBM Home Page Reader recognizes lang attributes and
can switch its reading mode accordingly - for a few languages. It's very
nice when it works, but it really applies to a rather small set of
browsing situations.

> Most of the site will be bilingual (with mostly separate English and
> Dutch pages), but other languages pop up a lot in the text, too, since
> it's a travel journal.

As a rule, bilingual or multilingual _pages_ should be avoided. _Sites_
should have each page in one language. There are several reasons to this.
For one thing, it is mildly disturbing to anyone who knows just one of the
two languages to read a bilingual page, even if it has the same
information in both languages (and if it does, why not make it two pages).
And anything that is mildly disturbing to a "normal" person could be
seriously disturbing to some people. - Of course, there are some good
reasons to use several languages on one page. I'm just trying to point out
that some of the reasons are not that good.

> In most cases I'm
> not marking up place names and person names unless I'm sure it's a
> direct transliteration of the actual name in the local/native language.

I tend to agree, mainly on practical grounds. There are theoretical
problems too, which I won't go into, but it would often mean a lot of
practical work. And often we just don't know the language. I wouldn't know
what language markup to use for your name, for example!

> But what language is an English transliteration of a Russian
> transliteration (or version!) of an Uzbek name?

At the theoretical level, transliteration is between writing system, not
languages. Thus, the Russian name for Moscow is still in Russian when
transliterated into Latin alphabet, Moskva. But when a name is adapted
into another language, so that the pronunciation and/or spelling is
clearly changed, it would be adequate to consider this as a language
change. Thus, I would use <span lang="en">Moscow</span>, <span
lang="ru">Moskva</span>, <span lang="fi">Moskova</span>, etc. On the
practical side, browsers and other software will understand such issues
even less than lang markup in general. There's actually no method in
markup for indicating the _writing system_ (such as Latin vs. Cyrillic),
though apparently it would often be easy to guess from the language and
the repertoire of characters used.

> My basic rule here is: when in doubt, don't.

Mine too - I have formulated it as "when in doubt, leave it out".

> For one class of names I do have a real problem though: how does one
> mark up scientific names for plants, birds, animals, etc? It's certainly
> a kind of language (though not _really_ Latin - so although there is an
> ISO language code for Latin (la) I cannot use that, I think).

I think lang="la" is the only feasible option if you wish to use language
markup here.

The scientific names are all in Latinized form. Even though a large
part of the vocabulary is of Greek origin, it's adapted to Latin
grammar to some minimal amount. The pronunciation should obey Latin
rules in principle. One might ask _which_ Latin. The Latin language
exists in several variants especially as regards to pronunciation
(so that "Citellus" could be pronounced with an initial "s" sound, or an
initial "k" sound, or an initial "ch" sound, or maybe an initial "ts"
sound). But there are no subcodes registered for Latin. Moreover, even if
there were, it would probably be better for accessibility reasons to use
just "la". Assuming that some browser starts pronouncing Latin, it should
probably do that in a manner that corresponds to the user's idea of Latin
pronunciation rather than the author's.

> For now, I've chosen to mark up as in this example: <span
> class="sci">Citellus fulvus</span> (that's a Yellow Souslik, in case
> you're interested), with my stylesheet taking care of properly
> italicizing such names (as required by the rules for scientific names).

I agree with Nick's suggestion to use <i> rather than <span> here. After
all, we very much like to have the names italicized; this should not
depend on style sheets. We would like to have structured markup like
<taxon>, but we haven't. Using <i> is the best shot. But I wouldn't call
it really _semantic_.

(And I'd use class="taxon", but that's fairly irrelevant - it's just a
name, except that it might evolve into some kind of convention that might
be marginally useful.)

> Another problem can be hreflang for the target language of a page I'm
> linking to: what to use if *that* is a multilingual page?

The theoretical answer is hreflang="mul". The "mul" code is the ISO 639-2
code for 'multiple languages': "The language code mul (for multiple
languages) should be applied when several languages are used and it is not
practical to specify all the appropriate language codes." And by HTML
definition, we _cannot_ specify more than one language code in an hreflang
attribute. (This might be an oversight. The HTTP header Content-Language,
which is what hreflang logically corresponds to, allows for a list of
language codes.)

In practical terms, there's hardly any browser that uses hreflang in any
way. Well, it could be used in an attribute selector in CSS, but that's
not very relevant here.

> Things like alt text and table summaries on a multilingual page can be
> fun, too. When there are only two languages on a page, you could just
> use both - but what if there are many more?

If you ask me, they should use the language of the content of the element.
Of course, a table might contain several languages. But it's "user
interface", like headings, are probably in one language only.

Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 29 January 2003 19:45:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:14:08 GMT