W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Language Identifier List Criteria

From: Tex Texin <tex@xencraft.com>
Date: Mon, 20 Dec 2004 12:05:27 -0800
Message-ID: <41C73087.3C8D38FE@xencraft.com>
To: Mark Davis <mark.davis@jtcsv.com>,John Cowan <jcowan@reutershealth.com>, www-international@w3.org,ietf-languages@alvestrand.no, "Martin Dürst" <duerst@w3.org>

Mark, et al.

I agree 100%. So we need for people to discuss criteria now and not the format
or even the contents of the page.

I have all the elements in a database now, and can easily manipulate the
relationships and produce whatever format we want.

Through database tables I can relate any language identifier to other
identifiers, so I can map de-LI to de-CH, to de, or whatever. I can map the
language and region codes to their official names or to more colloquial names.
Listings or reports can print "de-AT" alone or "de-AT (German, Austria)", or
whatever.
(Since there were several suggestions of preferences.)

But the important issue has not been addressed- the criteria.

The only items that seem reasonably clear, are the elements that do not require
regional qualifications.
(Yes, it depends on criteria.)

The others, although we have had suggestions for a very few items that form a
set of identifiers that can be mapped to a particular language identifier, but
not much has been offered about why these mappings are correct or how they are
derived.

Suggestions?

My thoughts run to the concrete - 

Does the government have laws or policies on writing, spelling, sorting etc.
that make the language unique?
Are there dictionaries or spell checkers, unique to the region?

If there aren't these type of "official" or confirming distinctions, I would
settle for observations that words or phrases can be listed with different
spellings, meanings or usages in the region (Maybe 1000 examples?)

To establish that a region uses another region's language variation, then
perhaps a list of distinguishing terms like this would demonstrate they are
used the same way.

As I have noted a few times already on these lists, I am not a linguist. I
welcome other suggestions and advice.

But if there is no concrete evidence to point at, then I question how real the
difference is.

If I can't ask a translator for Lichtenstien how their version of some text
must change from the Swiss version, and get some examples easily, then I wonder
how important tagging the text differently is.

Another test might be if a reader can determine the origin (it's region) of the
text easily.

tex



Mark Davis wrote:
> The broader point I was trying to make is that I think the criterion for
> inclusion in the list is a crucial question. In some sense, one can *always"
> find some difference between xx-AA and xx-BB; the question is whether that
> difference is significant with respect to given goal. If you don't have some
> reasonable clear idea of the criterion, (a) you don't know what qualifies
> for the list, and (b) nobody can use it. To avoid misleading people that
> happen upon the Language Identifier ist page, it is best to make clear (a)
> that the list is not complete, (b) that the criterion is not final.
> 
> And one has to be very careful about saying that something like "de-LI is
> not recommended". If you have a reasonable criterion for the list, it will
> form a set of equivalence classes among all countries that have a
> significant population of speakers, say:
> 
> {de-AT}
> {de-CH, de-LI)
> {de-DE, de-BE, de-DK, de-LU}
> 
> If you present the information this way, then a user can see that instead of
> using de-LI, he could use de-CH (with respect to that criterion, of course).
> But if you just give the list
> 
> de-AT, de-CH, de-DE, (not recommended: de-BE, de-DK, de-LI, de-LU)
> 
> and say that the (...) stuff is not recommended, then no guidance whatsoever
> is given to what to do with de-LI. The two most obvious answers to the
> neophyte are to choose "de" or "de-DE", both of which being wrong!
> 
> Mark
Received on Monday, 20 December 2004 20:05:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT