W3C home > Mailing lists > Public > www-style@w3.org > October 2003

Re: CSS2.1 :lang

From: Chris Lilley <chris@w3.org>
Date: Fri, 17 Oct 2003 15:54:09 +0200
Message-ID: <115226064.20031017155409@w3.org>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: Bert Bos <bert@w3.org>, Tex Texin <tex@i18nguy.com>, www-style@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>

On Friday, October 17, 2003, 10:48:41 AM, Jukka wrote:

JKK> On Thu, 16 Oct 2003, Chris Lilley wrote:

>> JKK> Anyway, what the XML specification says about the xml:lang attribute is
>> JKK> that "The values of the attribute are language identifiers as defined by
>> JKK> [IETF RFC 1766], Tags for the Identification of Languages, or its
>> JKK> successor on the IETF Standards Track."
>>
>> Please also look at the XML 1.0 eratta, and the XML 1.1 specification.

JKK> Good grief. I thought that it was unique to CSS specifications to make
JKK> changes in an "Errata", but the XML 1.0 "Errata" is apparently similar.
JKK> We have been given a _specification_ that is officially approved by the
JKK> W3C, containing a reference to an Errata, which says:
JKK> "This document records all known errors in - -"
JKK> but actually contains substantial _changes_ to the content of the
JKK> specification. It is left to readers to distinguish between typo fixes,
JKK> wording clarifications, and material changes.

JKK> So people who naively think they are reading the official specification
JKK> will be mislead. The specification may change at any moment, just by a
JKK> change to the "Errata", with no announcement before or after. And we don't
JKK> even have a copy of the specification as changed by the "Errata".

That last statement is false. Please see
http://www.w3.org/TR/1998/REC-xml-19980210
(XML 1.0, first edition)

http://www.w3.org/TR/2000/REC-xml-20001006
(XML 1.0 second edition) which also links to
http://www.w3.org/TR/2000/REC-xml-20001006-review.html
(review version with color coded changes)

and then there is the always-current
http://www.w3.org/TR/REC-xml

which points to the latest version, including any third or subsequent
edition. Its normal practice to reprint periodically incorporating
clarifications and errata.


JKK> And there is no XML 1.1 specification.

There is a 1.1 specification. However, it doesn't supercede 1.0.

JKK> (There is a candidate dated
JKK> 15 October 2002; it says: "It is inappropriate to cite this document as
JKK> other than 'work in progress.'")

>> JKK> I see no way how an empty string
>> JKK> could be interpreted as an accepted value for the attribute.
>>
>> I do, but then I am reading later specs than you seem to be.

JKK> I was reading the document that is announced by the W3C as a
JKK> specification.

In which you will find the text

The errata list for this second edition is available at
http://www.w3.org/XML/xml-V10-2e-errata.



>> JKK> By the HTML 4.* specification,
>>
>> (who cares!) its being phased out in favour of the one that the rest
>> of xml uses.

JKK> I do care. HTML 4 is the only specification for the semantics of HTML
JKK> elements and attributes;

Yes,

JKK> XHTML 1.0 just what it says (though the hype says otherwise): a
JKK> reformulation in XML or, rather, a reformulation of the _syntax_
JKK> of HTML 4.

Reformulation of the syntax *into XML*. Moving from an html-specific
:lang to an XML-generic xml:lang is part of that tightening up of the
syntax - possibly even to the point where it gets implemented in HTML
user agents.


JKK> Why would it need to be unset?

Because otherwise it would erroneously apply to child elements.

JKK> You can use either an appropriate language
JKK> code,

Clearly.

JKK> or one of the indicators "und" and "mul".

No, you should not do that. See RFC 3066
http://www.ietf.org/rfc/rfc3066.txt

   5. You SHOULD NOT use the UND (Undetermined) code unless the
   protocol in use forces you to give a value for the language tag,
   even if the language is unknown. Omitting the tag is preferred.

   6. You SHOULD NOT use the MUL (Multiple) tag if the protocol allows
   you to use multiple languages, as is the case for the Content-
   Language: header.



JKK> The argumentation in the XML 1.0 "errata" is very obscure - it
JKK> looks like they decided on "" and then tried to explain why it
JKK> was needed.

No, they followed RFC 306 and corrected XML which previously 'forced
you to give a value' (by ineritance, once set).

JKK> If there was a need for yet
JKK> another special code, it should have been formulated and proposed in the
JKK> appropriate process. But there wasn't; "und" is perhaps not optimally
JKK> clearly defined in ISO 639-2, but it's there for uses just like this.

Actually, it is specifically banned from uses like this with a MUST
NOT, which seems pretty clear to me.

>> JKK> In practical terms, :lang is pointless until support to
>> language markup JKK> in browsers becomes worth mentioning.
>>
>> I don't follow your point, unless you think that xml:lang is solely
>> something to do with styling.

JKK> I was referring to :lang selectors in CSS. Sorry for not being clear
JKK> enough here.

Aha. Okay, I misunderstood what you were referring to.

>> Its not; its also of use for searching, spell
>> checking, speech synthesis, and so forth.

JKK> I know the arguments.

(But my arguments thought you were referring tothe :lang or xml:lang
attributes, not the :lang selector, so that don't apply).

JKK> Yet, actual use of lang and xml:lang attributes is
JKK> very limited, and partly _wrong_. Try using lang="ru" for transliterated
JKK> Russian text and view the page on IE and you probably see what I mean.

Do you have a sample handy? I don't have any transliterated text at
hand to test this.

JKK> (It is a fundamental flaw in language markup that there is no way
JKK> to indicate the writing system. But language does not change when
JKK> the letters are transliterated, does it?)

I agree that specifying script and specifying language are orthogonal.

>> JKK>  Since the whole point in CSS 2.1
>> JKK> is to define a practical subset of CSS 2.0, I don't see why :lang is kept
>> JKK> there at all.
>>
>> Possibly because, at least in theory, CSS2.1 is not restricted to
>> buggy HTML browsers that have not changed much over the last 4 years.
>> Instead, its all CSS implementations.

JKK> Really? So what is the point of CSS 2.1 then? Why have so many
JKK> CSS 2.0 features been removed from it?

Because (thankfully) buggy crappy HTML browsers are not the only
implementation experience we have. There are also a few much less
buggy and actively maintained (x)html browsers that implement CSS, and
there are implementations of CSS for other languages than XML (for
example, XForms and SVG).

I agree that the extent of the surgery is a little worrying and in
some cases seems to have given little note to non-HTML uses. That
probably reflects the interests and priorities of those actively
working on it.

>> JKK> Besides, the actual meaning of language markup is still obscure.
>> JKK> The whole thing is vaguely defined, little used, and little
>> JKK> supported,
>>
>> I invite you to back up those claims.

JKK> OK, see http://www.cs.tut.fi/~jkorpela/kielimerkkaus/
JKK> It's in Finnish, so it might not be optimally accessible to you.

However, it has an English summary as the final link, which was
helpful to me as was your summary below.


JKK> Just to summarize a few points:
JKK> - the writing system problem I mentioned above
JKK> - the conflicts between the various meanings and purposes of language
JKK>   markup; example: if a document (in a language other than English)
JKK>   discusses CSS and mentions, say, the property name vertical-align,
JKK>   should it be marked up as being in English (thereby making suitable
JKK>   pronunciation possible, but confusing spelling and grammar checkers,
JKK>   since it does not really obey normal English rules)

Good point, there is a growing body of 'technical english' that obeys
its own rules and is partially incorporated into other languages,
somethimes with respellings (eg in french, (e)mail becomes mèl to
conserve the sound while altering the spelling; other languages keep
the spelling but pronounce as a word in their own language). How to
best mark that up is a problem. It dosn't strike me as enough of a
problem to not use language markup at all.

JKK> - how do you deal with words and expressions that are commonly
JKK>   used in other languages - is "fiancé", when used in English text,
JKK>   a French word? what about "status quo"

If its being used as an english word it should be marked as english.
Its language, not pronunciation and not etymology.

JKK>   (such problems don't exist when language codes are used e.g. as
JKK>   for bibliographic purposes; but as you get down to individual
JKK>   words and even morphemes, marking up _all_ language changes as
JKK>   WCAG 1.0 requires, it's a huge conceptual problem, in addition
JKK>   to being quite some work in practice)

I agree, and I am surprised that WCAG 1.0 requires markup at
individual morpheme level.

JKK> - what do you do with words that contain parts from different
JKK>   languages?
JKK> - how do declare the language of data in attribute (e.g.
JKK>   title="..." attributes), as required by WCAG 1.0?

Another illustration of why human-presentable text in attributes is
wrong. It should be corrected by moving title to an element (and not,
I hasten to add, by some bogus attribute-grouping hack like
'titlelang')

JKK> - by W3C example, names are not marked up as being in their
JKK>   respective languages; what might justify this, in the light
JKK>   of reasons presented for language markup in general.

Could you give some examples where names are not marked up in the
correct language? Some might be omissions and some might be the
"fiancé" use case where the word is french by etymology but english by
usage and increasingly by pronunciation as well).

Incidentally I agree with your summary "The author recommends that at
word level, markup be used to indicate language changes in
unproblematic cases only; "if in doubt, leave it out"."

Thanks by the way for pointing me at your essay, which I skimmed to
try and get the ghist of what you are saying. Have you considered
submitting a paper to the Internationalisation and Unicode conference
on this topic?

-- 
 Chris                            mailto:chris@w3.org
Received on Friday, 17 October 2003 09:54:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 27 April 2009 13:54:24 GMT