- From: Tex Texin <tex@xencraft.com>
- Date: Mon, 20 Dec 2004 19:49:19 -0800
- To: Mark Davis <mark.davis@jtcsv.com>
- CC: Georg Schweizer <gschweizer@gmx.at>, www-international@w3.org, ietf-languages@alvestrand.no, John Cowan <jcowan@reutershealth.com>
Mark, For quotations, I prefer Lewis Carroll's: 'When I use a word,' Humpty Dumpty said in rather a scornful tone, 'it means just what I choose it to mean -- neither more nor less.' ;-) The distinctions you are drawing strike me a bit like the difference between precision and accuracy- you can have one without the other. But that is an aside. Now if the meaning were not ambiguous, I could take something written by a German, with blond hair, living and writing in Lichtenstein, and know that I should label it de-LI. But, I am being told it is wrong to do that, I should use de-CH. And if I have an article written in Japanese, by a red head (but dyed), in Japan, if I label it ja-JP, I am also wrong, it should be just ja. And when I ask, how to determine that what I am told is right, I am not getting an answer I can apply, just some virtual hand waving. So, whereas some might consider these edge cases, I consider the lack of criteria tantamount to ambiguity. However, just to be clear, in case my mail is being misinterpreted as an attack on 3066 or its successor(s), that is not at all my intent. RFC 3066 is a valuable tool, needed for consistency and interoperability. We can agree on 3066 as a naming convention. But there is also a need for guidelines for semantics of the language tags. It's possible that there might be a need for several different guidelines, one for linguists, one for software message catalogs, one for labeling web pages, one for use in web services, etc. We have noted in the thread that it might need to vary by application. If so, so be it. I am just looking for a beginning to identifying the criteria for some contexts. If my comments were taken as an objection to 3066bis, they were not intended, nor should they be taken, to be so. Meanwhile, I await more suggestions for criteria. On another related topic, I am considering for the next version of the table to organize it differently. It strikes me that for my needs, and my intended audience, that it is not as interesting to list languages and noting which regions they are spoken, as to list each of the regions and note the languages used there. If I do that, I do not have to deal with meaningless identifiers and map them to the correct ones to use. So I might have: region languages JP ja LI de-CH (maybe others, I don't know.) CH de-CH, it, fr-FR, rm US en-US, es-US CA fr-CA, en-CA, iu With this approach, I can suggest something like the most popular choices, not rule out the existence of other languages being used, the lack of de-LI makes a statement about de-LI vs de-CH, without being as explicit about criteria, other than perhaps a combination of popular choices by major software vendors, offical languages, and claims of encyclopedias and the like as to what is spoken where. This approach is more helpful to folks like me who are looking to answer what they need to provide. If someone wants to know how many variants of German they need, they can scan the table for all listings of de and de-*, and even scan just the regions they support to determine their perhaps more exact needs. It's also easier for me to accept edits of the list from people suggesting that language xx-YY is used in region ZZ, without a lot of vetting effort. Would that work for people? tex Mark Davis wrote: > > > However, RFC 3066's approach is generative. So de-AT is created by > combining > > codes from each of ISO 639 and ISO 3166, and neither defines what this > means. > > In fact RFC 3066 only defines the production and not which of the produced > > values are meaningful or what they mean except in the most general terms. > > You keep expressing this in a counterproductive way. The language tag > 'de-AT' is reasonably defined: German as used in Austria. > > Suppose I have a protocol ID that distinguishes categories of people by > combining hair-color with nationality. Then "Samoan, blond" is perfectly > well defined. The fact that there are no existing examples does *NOT* mean > that it is "ambiguous", "not meaningful", or "not well-defined". And let's > suppose that all Danes were blond. Then "Dane, blond" would still be well > defined. The fact that it happens to have the same current denotation as > "Dane" does *NOT* mean that it is "ambiguous", "not meaningful", or "not > well-defined". > > To paraphrase Inigo Montoya, "I do not think those words mean what you think > they mean!" > > Now, there are in fact edge cases; when is someone dishwater blond vs pale > brunette; what do you do with dual citizenship, etc. But it doesn't mean > that the protocol is senseless. And once you have a criterion of usage, you > can establish when two IDs have the same denotation or not, by doing the > research to see whether there are in fact non-blond Danes. > > What you are really looking for is which language tags have the same > denotation, under some criterion of usage. And that criterion might be "does > someone need to provide different localizations (for non-speech enabled > applications)". That is *very* different from saying that the language tags > are "ambiguous", "not meaningful", or "not well-defined". > > ‎Mark > > ----- Original Message ----- > From: "Tex Texin" <tex@xencraft.com> > To: "Georg Schweizer" <gschweizer@gmx.at> > Cc: <www-international@w3.org>; <ietf-languages@alvestrand.no> > Sent: Monday, December 20, 2004 13:26 > Subject: Re: Language Identifier List Criteria > > > Well, I will leave it to others to debate the characterization of the > standards > > as political, if they choose to. > > However, RFC 3066's approach is generative. So de-AT is created by > combining > > codes from each of ISO 639 and ISO 3166, and neither defines what this > means. > > In fact RFC 3066 only defines the production and not which of the produced > > values are meaningful or what they mean except in the most general terms. > > That's why we are having this discussion. > > > > Under RFC 3066 it is possible to create combinations of language and > region > > that have no useful value. > > So this need not be determined politically. > > > > For myself, I am looking for guidance for software and web producers. > Which > > labels to use when tagging content? When is there a enough of a difference > that > > bears paying for a translation? > > > > It is not clear to me it should be purely linguistic however. > > Politics is perhaps one element of the criteria. > > tex > > > > > > Georg Schweizer wrote: > > > > > > > Some languages are spoken in many countries, and the language is not > > > > distinctive in each country. I have started to accept suggestions as > > > > to which language-region codes do not represent a distinct language > > > > variation, and therefore are not recommended as tags, without good > > > > reason. > > > http://www.i18nguy.com/unicode/language-identifiers.html > > > > > > The criteria should be political rather than linguistic ones, as both > > > the ISO 639 language tags and the ISO 3166 country codes are based on > > > political agreement. Therefore I would not speak of "distict language > > > variations", but of distinct *official standards* (or at least distinct > > > conventions). Variations can be found everywhere (even within one > > > political region), whereas the same conventions can be followed by > > > several countries. > > > > -- > > ------------------------------------------------------------- > > Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com > > Xen Master http://www.i18nGuy.com > > > > XenCraft http://www.XenCraft.com > > Making e-Business Work Around the World > > ------------------------------------------------------------- > > > > _______________________________________________ > > Ietf-languages mailing list > > Ietf-languages@alvestrand.no > > http://www.alvestrand.no/mailman/listinfo/ietf-languages > > -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Tuesday, 21 December 2004 03:49:33 UTC