- From: Peter B. West <pbwest@powerup.com.au>
- Date: Thu, 25 Jul 2002 11:00:50 +1000
- To: www-xsl-fo@w3.org
Paul et al., In one strand of the FOP redesign, I load language, country and script codes from external XML into hashes. All these do is validate the codes, although that has the virtue of distinguishing between unsupported and invalid codes. I used 2-character codes because they were preferred to 3-character codes in the spec, because I did not anticipate that there would be a big demand for languages which have only a 3-character code, and because it is a simple matter to extend the external XML to include all or only the necessary 3-character codes. In section 2.3 of 3066 it is mentioned that all additions must have both a 2-ch code and a 3-ch code, but that existing 3-ch only language codes will never acquire a 2-ch code. Nice. Full coverage therefore will always require at least some 3-ch codes. Section 2.5 of 3066, "Language ranges" doesn't seem to give any help in deciding what to do with a bare language tag, e.g. "fr" or "en". (I'm assuming here that "fr", for instance, qualifies as a prefix according to section 2.5.) Ken Holman's posting on the difficulties of French speakers in Canada bears out this comment in 2.5. " NOTE: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix. The prefix rule simply allows the use of prefix tags if this is the case." This note begs the question though. What does it mean to say that "a user understands a language with a certain tag" when that tag is "fr"? Or have I misunderstood this section completely? Incidentally, there seems to be a slight problem with the definition of xml:lang. 7.29.24 "xml:lang" has "Values have the following meanings: <string> A language and/or country specifier in conformance with [RFC3066]." Shouldn't that read: "A language or a language-country specifier ...", removing the implication that one can have a bare country specifier? And for dessert, this delicious note from section 2.3. "3. When a language has no ISO 639-1 2-character code, and the ISO 639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic) code differ, you MUST use the Terminology code. NOTE: At present, all languages for which there is a difference have 2-character codes, and the displeasure of developers about the existence of 2 code sets has been adequately communicated to ISO." Peter -- Peter B. West pbwest@powerup.com.au http://powerup.com.au/~pbwest "Lord, to whom shall we go?"
Received on Wednesday, 24 July 2002 21:01:15 UTC