- From: Peter B. West <pbwest@powerup.com.au>
- Date: Thu, 25 Jul 2002 11:00:50 +1000
- To: www-xsl-fo@w3.org
Paul et al.,
In one strand of the FOP redesign, I load language, country and script
codes from external XML into hashes. All these do is validate the
codes, although that has the virtue of distinguishing between
unsupported and invalid codes.
I used 2-character codes because they were preferred to 3-character
codes in the spec, because I did not anticipate that there would be a
big demand for languages which have only a 3-character code, and because
it is a simple matter to extend the external XML to include all or only
the necessary 3-character codes. In section 2.3 of 3066 it is mentioned
that all additions must have both a 2-ch code and a 3-ch code, but that
existing 3-ch only language codes will never acquire a 2-ch code. Nice.
Full coverage therefore will always require at least some 3-ch codes.
Section 2.5 of 3066, "Language ranges" doesn't seem to give any help in
deciding what to do with a bare language tag, e.g. "fr" or "en". (I'm
assuming here that "fr", for instance, qualifies as a prefix according
to section 2.5.) Ken Holman's posting on the difficulties of French
speakers in Canada bears out this comment in 2.5.
" NOTE: This use of a prefix matching rule does not imply that language
tags are assigned to languages in such a way that it is always true
that if a user understands a language with a certain tag, then this
user will also understand all languages with tags for which this tag
is a prefix. The prefix rule simply allows the use of prefix tags if
this is the case."
This note begs the question though. What does it mean to say that "a
user understands a language with a certain tag" when that tag is "fr"?
Or have I misunderstood this section completely?
Incidentally, there seems to be a slight problem with the definition of
xml:lang. 7.29.24 "xml:lang" has
"Values have the following meanings:
<string>
A language and/or country specifier in conformance with [RFC3066]."
Shouldn't that read: "A language or a language-country specifier ...",
removing the implication that one can have a bare country specifier?
And for dessert, this delicious note from section 2.3.
"3. When a language has no ISO 639-1 2-character code, and the ISO
639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic)
code differ, you MUST use the Terminology code. NOTE: At present,
all languages for which there is a difference have 2-character
codes, and the displeasure of developers about the existence of 2
code sets has been adequately communicated to ISO."
Peter
--
Peter B. West pbwest@powerup.com.au http://powerup.com.au/~pbwest
"Lord, to whom shall we go?"
Received on Wednesday, 24 July 2002 21:01:15 UTC