W3C home > Mailing lists > Public > www-style@w3.org > October 2003

Re: CSS2.1 :lang

From: Alexander Savenkov <w3@hotbox.ru>
Date: Sat, 18 Oct 2003 18:37:20 +0400
Message-ID: <1915660603.20031018183720@hotbox.ru>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: www-style@w3.org

Hello,

2003-10-17T19:05:02Z Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:

> On Fri, 17 Oct 2003, Alexander Savenkov wrote:

>> > I know the arguments. Yet, actual use of lang and xml:lang attributes is
>> > very limited, and partly _wrong_. Try using lang="ru" for transliterated
>> > Russian text and view the page on IE and you probably see what I mean.
>>
>> I can't see what you mean. Tell me what happens.

> See http://www.cs.tut.fi/~jkorpela/kielimerkkaus/4.html#trans
> It's in Finnish, but the text with green background should show what I
> mean, on IE for example. The browser changes font _for Latin letters_ when
> I use lang="ru" containing transliterated Russian.

Hm, I couldn't reproduce this on my machine. I however can assume IE
picks the glyphs from another font (that is used for Russian) as it
prepares to render the characters that are not present in the current
font. It switches to another font where Cyrillic exists but gets
fooled by the inappropriate use of the 'lang' attribute (Dostojevski
is not a Russian word).

Btw, it seems that your page has some charset problem: accented
characters are not visible inside the <code> element.

> So instead of helping in styling, the lang attribute creates a problem -
> which is perhaps easily solvable in CSS for those browsing situations
> where CSS is enabled, but still.

>> > (It is a fundamental flaw in language markup that there is no way to
>> > indicate the writing system. But language does not change when the letters
>> > are transliterated, does it?)
>>
>> It does.

> So what lang attribute should I use for transliterated Russian.

I think the answer is not to use transliterated Russian at all.
Transliteration was invented when there were systems that allowed
Latin characters only. The Web is not such a system anymore.

Dostojevski is clearly not a transliteration but a translation.

>> Russians don't normally transliterate letters

> I know, but others transliterate Russian. And the dual problem exists when
> Russians write foreign words in Cyrillic letters.

Could you come up with a couple of examples if you think there is a
problem?

>> and it's hard
>> to read transliteration though a standard exists.

> _Several_ standards exist. That is one of the problems.

For Russian, there is one and only GOST standard that should be used.
It's a state standard and iirc is accepted by the ISO.

> There is no way to
> indicate which transliteration has been used (which is a problem separate
> from indicating the script - the same Latin script can be used in many
> different ways for transliterations).

Again, don't use the transliteration if you can use the native script.
For some names, e.g., Kim Jong-Il, the language of the transliterated
version changes. So, if you have an HTML document in English you
shouldn't markup the transliterated name.

<html lang="en">
...
<p>Kim Jong-Il
...

or

<html lang="ru">
...
<p>Kim Chen Ir (transliterated here but normally in Cyrillic)
...

but

<html lang="en">
...
<p>Kim Jong-Il (<span lang="ko">(Korean characters)</span>)
...

>> Russian is currently
>> written in Cyrillic script only, changing it to Arabic or Lating would
>> change the language.

> Several peoples have changed the script of their language, even rather
> recently. They have not changed their language. But my point was about
> transliteration (or transcription).

Agreed. However the language in those cases got affected by the change
of the script.

>> That's what the xml:lang="" is for. Markup your CSS examples with
>>
>> <code xml:lang="">tr { vertical-align: top; }</code>
>>
>> (because CSS is not a human language)
>>
>> and a smart spell-checker will skip the block.

> And how does a speech browser _read_ it?

No idea. How do _you_ read it? I, for instance, would say something
like "set the 'vertical-align' property on the 'tr' element to 'top'".
Should an aural engine do the same? Is it possible? Perhaps the
'speak-punctuation: code' block is the best thing in this case.

>> > - what do you do with words that contain parts from different
>> >   languages?
>>
>> Mark them up accordingly. If I had to write the word "web-page" in
>> Russian I would type (transliterated):

> That's an easy case. What about declination suffixes, which may induce
> changes in word stem?

I'm not a linguistic expert. Please expand or provide examples.

Alex.
-- 
  Alexander "Croll" Savenkov                  http://www.thecroll.com/
  w3@hotbox.ru                                     http://croll.da.ru/
Received on Saturday, 18 October 2003 10:48:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 27 April 2009 13:54:24 GMT