Re: Question on implementation of the language property

Paul et al.,

In one strand of the FOP redesign, I load language, country and script 
codes from external XML into hashes.  All these do is validate the 
codes, although that has the virtue of distinguishing between 
unsupported and invalid codes.

I used 2-character codes because they were preferred to 3-character 
codes in the spec, because I did not anticipate that there would be a 
big demand for languages which have only a 3-character code, and because 
it is a simple matter to extend the external XML to include all or only 
the necessary 3-character codes.  In section 2.3 of 3066 it is mentioned 
that all additions must have both a 2-ch code and a 3-ch code, but that 
existing 3-ch only language codes will never acquire a 2-ch code.  Nice. 
  Full coverage therefore will always require at least some 3-ch codes.

Section 2.5 of 3066, "Language ranges" doesn't seem to give any help in 
deciding what to do with a bare language tag, e.g. "fr" or "en".  (I'm 
assuming here that "fr", for instance, qualifies as a prefix according 
to section 2.5.)  Ken Holman's posting on the difficulties of French 
speakers in Canada bears out this comment in 2.5.

" NOTE: This use of a prefix matching rule does not imply that language
    tags are assigned to languages in such a way that it is always true
    that if a user understands a language with a certain tag, then this
    user will also understand all languages with tags for which this tag
    is a prefix.  The prefix rule simply allows the use of prefix tags if
    this is the case."

This note begs the question though.  What does it mean to say that "a 
user understands a language with a certain tag" when that tag is "fr"? 
Or have I misunderstood this section completely?

Incidentally, there seems to be a slight problem with the definition of 
xml:lang. 7.29.24 "xml:lang" has

"Values have the following meanings:
<string>
     A language and/or country specifier in conformance with [RFC3066]."

Shouldn't that read: "A language or a language-country specifier ...", 
removing the implication that one can have a bare country specifier?


And for dessert, this delicious note from section 2.3.

"3. When a language has no ISO 639-1 2-character code, and the ISO
639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic)
code differ, you MUST use the Terminology code.  NOTE: At present,
all languages for which there is a difference have 2-character
codes, and the displeasure of developers about the existence of 2
code sets has been adequately communicated to ISO."


Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://powerup.com.au/~pbwest
"Lord, to whom shall we go?"

Received on Wednesday, 24 July 2002 21:01:15 UTC