- From: Peter Constable <petercon@microsoft.com>
- Date: Thu, 12 Apr 2007 22:26:12 -0700
- To: Mark Davis <mark.davis@icu-project.org>, John Cowan <cowan@ccil.org>
- CC: "www-international@w3.org" <www-international@w3.org>, LTRU Working Group <ltru@ietf.org>
- Message-ID: <DDB6DE6E9D27DD478AE6D1BBBB8357955D038B87D0@NA-EXMSG-C117.redmond.corp.microsoft>
There is a difference between 'no information provide', and 'information is provided: this is unknown'. I don't see what the issue is; I guess I must have missed the start of this thread. Peter ________________________________ From: Mark Davis [mailto:mark.davis@icu-project.org] Sent: Thursday, April 12, 2007 10:29 AM To: John Cowan Cc: www-international@w3.org; LTRU Working Group Subject: Re: [Ltru] RE: For review: Tagging text with no language Q1. I had missed the choice of "mis". I agree with that suggestion; we should incorporate that into 4646bis. The problem is ameliorated considerably once we add -3, but it doesn't disappear completely, so "mis" remains a good choice for dealing with that situation. Q2. The issue *does* remain, since we talk about "und" vs the absence of a language tag, which "" represents. Mark On 4/12/07, John Cowan <cowan@ccil.org<mailto:cowan@ccil.org>> wrote: Mark Davis scripsit: > The summary looks good. This discussion raises 2 items for the LTRU > group. > > Q1. What tag should be used where it is definitely a language, but there > is no code available yet? (This is an area where ISO 15924 is ahead > of ISO 639 (and 3166), since it has Zzzz: Code for uncoded script.) In principle, every natural-language item (text, audio, video) can be coded with some 639-2 code; if the language does not have a code of its own, it will belong to one of the 639-2 collections. For example, the language Tarifit (639-3 code 'rif') does not have a 639-2 code, but it is a Berber language; consequently, an item in Tarifit may be validly tagged 'ber', which represents the collection of Berber languages. Similarly, the language Zumbun (639-3 code 'jmb') does not have an 639-2 code, nor does it belong to any of the smaller 639-2 collections, but it does belong to the Afro-Asiatic language family; consequently, an item in Zumbun may be validly tagged 'afa', which represents the collection of Afro-Asiatic languages. If all else fails, as for the language isolate Burushaski (639-3 code 'bsk'), the 639-2 collection code 'mis', representing the collection of miscellaneous languages, may be applied. This is the ultimate fallback code, indicating that the language is known but nothing useful can be said about it using 639-2 codes. All of this lore, which represents the practice of the Library of Congress (the ultimate source of 639-2), can of course go away when RFC 4646bis goes into effect. If it is necessary to be more specific before then, and if strict compliance to 4646 is required, then rif-x-tarifit, afa-x-jumbun, and mis-x-burushas may also be used. > Q2. Clarify the wording around "und" vs "". "" is not a well-formed language tag according to RFC 4646, so there is nothing to say about it there. It is defined by the XML Recommendation as an extension to the set of language tags, and having the same significance as no language declaration at all. -- Dream projects long deferred John Cowan < cowan@ccil.org<mailto:cowan@ccil.org>> usually bite the wax tadpole. http://www.ccil.org/~cowan --James Lileks -- Mark
Received on Friday, 13 April 2007 05:26:20 UTC