Problems with :lang (was Re: CSS2.1 :lang)

Jukka,

FWIW, we can agree there are numerous problems with language identifiers.

The issue of writing system is being worked on and there is a proposal to add
script identifiers to the language identifiers, so that you would have
language-script-country. (script is 4 letter code to distinguish it from the
other 2.)

Some of the issues you mention having to do with correct practice for marking
up text with language identifiers needs to be worked on and written up (in my
opinion).

I think it would be good if you would pursue discussion of these issues with
the i18n group.
Perhaps a translation of your document would help get the discussion going,
although I think this mail would make a good starting point if you want to
participate in the i18n mail list.

tex


"Jukka K. Korpela" wrote:
> > JKK> the default value of the lang attribute is JKK> unknown. This is
> > really mystical, but it seems to postulate that there JKK> _is_ a
> > default value.
> >
> > One which was not possible to put in the serialisation, so yes
> > previously rather mystical. In particular, once it was set on some
> > element, it could not be undet on any children. Thats what xml:lang=""
> > does.
> 
> Why would it need to be unset? You can use either an appropriate language
> code, or one of the indicators "und" and "mul". The argumentation in the
> XML 1.0 "errata" is very obscure - it looks like they decided on "" and
> then tried to explain why it was needed. If there was a need for yet
> another special code, it should have been formulated and proposed in the
> appropriate process. But there wasn't; "und" is perhaps not optimally
> clearly defined in ISO 639-2, but it's there for uses just like this.
> 
> > JKK> In practical terms, :lang is pointless until support to language markup
> > JKK> in browsers becomes worth mentioning.
> >
> > I don't follow your point, unless you think that xml:lang is solely something
> > to do with styling.
> 
> I was referring to :lang selectors in CSS. Sorry for not being clear
> enough here.
> 
> > Its not; its also of use for searching, spell
> > checking, speech synthesis, and so forth.
> 
> I know the arguments. Yet, actual use of lang and xml:lang attributes is
> very limited, and partly _wrong_. Try using lang="ru" for transliterated
> Russian text and view the page on IE and you probably see what I mean.
> (It is a fundamental flaw in language markup that there is no way to
> indicate the writing system. But language does not change when the letters
> are transliterated, does it?)
> 
> > JKK>  Since the whole point in CSS 2.1
> > JKK> is to define a practical subset of CSS 2.0, I don't see why :lang is kept
> > JKK> there at all.
> >
> > Possibly because, at least in theory, CSS2.1 is not restricted to
> > buggy HTML browsers that have not changed much over the last 4 years.
> > Instead, its all CSS implementations.
> 
> Really? So what is the point of CSS 2.1 then? Why have so many CSS 2.0
> features been removed from it?
> 
> > JKK> Besides, the actual meaning of language markup is still obscure.
> > JKK> The whole thing is vaguely defined, little used, and little
> > JKK> supported,
> >
> > I invite you to back up those claims.
> 
> OK, see http://www.cs.tut.fi/~jkorpela/kielimerkkaus/
> It's in Finnish, so it might not be optimally accessible to you.
> Just to summarize a few points:
> - the writing system problem I mentioned above
> - the conflicts between the various meanings and purposes of language
>   markup; example: if a document (in a language other than English)
>   discusses CSS and mentions, say, the property name vertical-align,
>   should it be marked up as being in English (thereby making suitable
>   pronunciation possible, but confusing spelling and grammar checkers,
>   since it does not really obey normal English rules)
> - how do you deal with words and expressions that are commonly
>   used in other languages - is "fiancé", when used in English text,
>   a French word? what about "status quo"
>   (such problems don't exist when language codes are used e.g. as
>   for bibliographic purposes; but as you get down to individual
>   words and even morphemes, marking up _all_ language changes as
>   WCAG 1.0 requires, it's a huge conceptual problem, in addition
>   to being quite some work in practice)
> - what do you do with words that contain parts from different
>   languages?
> - how do declare the language of data in attribute (e.g.
>   title="..." attributes), as required by WCAG 1.0?
> - by W3C example, names are not marked up as being in their
>   respective languages; what might justify this, in the light
>   of reasons presented for language markup in general.
> 
> --
> Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Received on Friday, 17 October 2003 05:41:40 UTC