- From: Jack Lindsey <tuquenukem@hotmail.com>
- Date: Fri, 19 Dec 2003 13:29:54 -0500
- To: ht@cogsci.ed.ac.uk
- Cc: xmlschema-dev@w3.org
>From: ht@cogsci.ed.ac.uk (Henry S. Thompson) >Just what are you trying to rule out? The regulatory situation >regarding language codes, as spelled out in RFC 3066 [1], is >sufficiently complicated that the lexical space constraint given in >the schema REC (as amended) for the xs:language type [2] is really the >strictest it's practical to enforce. With IANA having registered >e.g. cel-gaulish and de-AT-1901 as legal tags, there's really not much >we can do here. > > > I love this, from "http://www.w3.org/2001/xml.xsd" > ><snip/> > >The comment will be removed when the above-cited erratum is formally >encorporated in the 2nd edition of the Schema REC. > >ht > >[1] http://www.ietf.org/rfc/rfc3066.txt >[2] http://www.w3.org/2001/05/xmlschema-errata#e2-25 Understood. We have just made a limited publication of an XML vocabulary for standardized data exchange within a government sector in an officially bilingual jurisdiction. Let me take this opportunity to thank the participants of this list for all the help they have both consciously and unwittingly given me over the last year. In particular, I would like to thank Henry and Jeni for their invaluable advice which is much appreciated and much implemented (I do not yet have permission to publish a link). We have decreed that text generated by our current partners (as opposed to obtained from external sources) should use the values: en-CA (Canadian English) fr-CA (Canadian French) Other potential values might in future include: en-GB (British English) en-US (American English) es-MX (Spanish) iu (Inuktitut - would require UTF-16) This is for all the usual, anticipated page reader, translation software, character set rendering reasons. But in addition, we make extensive use of coded information, specifed either as terse, language-neutral values or language-specific texts, for which we are going to provide "code table lookup" facilities (In ISO 11179-3 terminology: cross-references between permissable value instances of related value domains, e.g. from ISO 3166 Country Code (3-digit numeric) to Country Short Name in English or Country Short Name in French (actually the majority of our codes are home-grown)). For this purpose, only en-CA and fr-CA (the official languages) are relevant, but we did not want to use multiple language identification techniques, especially since in practice these are probably the only languages which will show up in any context for the first few early years. So, since in the near future the format of xml:lang values will be validated, I imagine (thinking aloud - comments welcomed) I will allow page readers, translation software, etc. to make what they will of any values that show up. But in my table-lookup XSLT templates I will interpret as English "en" or anything beginning "en-", case-insensitive, the same for "fr", and default to English for anything else. Cheers Jack _________________________________________________________________ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
Received on Friday, 19 December 2003 13:37:53 UTC