- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 17 Oct 2003 11:48:41 +0300 (EEST)
- To: Chris Lilley <chris@w3.org>
- Cc: Bert Bos <bert@w3.org>, Tex Texin <tex@i18nguy.com>, www-style@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>
On Thu, 16 Oct 2003, Chris Lilley wrote: > JKK> Anyway, what the XML specification says about the xml:lang attribute is > JKK> that "The values of the attribute are language identifiers as defined by > JKK> [IETF RFC 1766], Tags for the Identification of Languages, or its > JKK> successor on the IETF Standards Track." > > Please also look at the XML 1.0 eratta, and the XML 1.1 specification. Good grief. I thought that it was unique to CSS specifications to make changes in an "Errata", but the XML 1.0 "Errata" is apparently similar. We have been given a _specification_ that is officially approved by the W3C, containing a reference to an Errata, which says: "This document records all known errors in - -" but actually contains substantial _changes_ to the content of the specification. It is left to readers to distinguish between typo fixes, wording clarifications, and material changes. So people who naively think they are reading the official specification will be mislead. The specification may change at any moment, just by a change to the "Errata", with no announcement before or after. And we don't even have a copy of the specification as changed by the "Errata". Yet the specification claims: "It is a stable document and may be used as reference material or cited as a normative reference from another document." And there is no XML 1.1 specification.(There is a candidate dated 15 October 2002; it says: "It is inappropriate to cite this document as other than 'work in progress.'") > JKK> I see no way how an empty string > JKK> could be interpreted as an accepted value for the attribute. > > I do, but then I am reading later specs than you seem to be. I was reading the document that is announced by the W3C as a specification. > JKK> By the HTML 4.* specification, > > (who cares!) its being phased out in favour of the one that the rest > of xml uses. I do care. HTML 4 is the only specification for the semantics of HTML elements and attributes; XHTML 1.0 just what it says (though the hype says otherwise): a reformulation in XML or, rather, a reformulation of the _syntax_ of HTML 4. > JKK> the default value of the lang attribute is JKK> unknown. This is > really mystical, but it seems to postulate that there JKK> _is_ a > default value. > > One which was not possible to put in the serialisation, so yes > previously rather mystical. In particular, once it was set on some > element, it could not be undet on any children. Thats what xml:lang="" > does. Why would it need to be unset? You can use either an appropriate language code, or one of the indicators "und" and "mul". The argumentation in the XML 1.0 "errata" is very obscure - it looks like they decided on "" and then tried to explain why it was needed. If there was a need for yet another special code, it should have been formulated and proposed in the appropriate process. But there wasn't; "und" is perhaps not optimally clearly defined in ISO 639-2, but it's there for uses just like this. > JKK> In practical terms, :lang is pointless until support to language markup > JKK> in browsers becomes worth mentioning. > > I don't follow your point, unless you think that xml:lang is solely something > to do with styling. I was referring to :lang selectors in CSS. Sorry for not being clear enough here. > Its not; its also of use for searching, spell > checking, speech synthesis, and so forth. I know the arguments. Yet, actual use of lang and xml:lang attributes is very limited, and partly _wrong_. Try using lang="ru" for transliterated Russian text and view the page on IE and you probably see what I mean. (It is a fundamental flaw in language markup that there is no way to indicate the writing system. But language does not change when the letters are transliterated, does it?) > JKK> Since the whole point in CSS 2.1 > JKK> is to define a practical subset of CSS 2.0, I don't see why :lang is kept > JKK> there at all. > > Possibly because, at least in theory, CSS2.1 is not restricted to > buggy HTML browsers that have not changed much over the last 4 years. > Instead, its all CSS implementations. Really? So what is the point of CSS 2.1 then? Why have so many CSS 2.0 features been removed from it? > JKK> Besides, the actual meaning of language markup is still obscure. > JKK> The whole thing is vaguely defined, little used, and little > JKK> supported, > > I invite you to back up those claims. OK, see http://www.cs.tut.fi/~jkorpela/kielimerkkaus/ It's in Finnish, so it might not be optimally accessible to you. Just to summarize a few points: - the writing system problem I mentioned above - the conflicts between the various meanings and purposes of language markup; example: if a document (in a language other than English) discusses CSS and mentions, say, the property name vertical-align, should it be marked up as being in English (thereby making suitable pronunciation possible, but confusing spelling and grammar checkers, since it does not really obey normal English rules) - how do you deal with words and expressions that are commonly used in other languages - is "fiancé", when used in English text, a French word? what about "status quo" (such problems don't exist when language codes are used e.g. as for bibliographic purposes; but as you get down to individual words and even morphemes, marking up _all_ language changes as WCAG 1.0 requires, it's a huge conceptual problem, in addition to being quite some work in practice) - what do you do with words that contain parts from different languages? - how do declare the language of data in attribute (e.g. title="..." attributes), as required by WCAG 1.0? - by W3C example, names are not marked up as being in their respective languages; what might justify this, in the light of reasons presented for language markup in general. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Friday, 17 October 2003 04:52:30 UTC