- From: Richard Ishida <ishida@w3.org>
- Date: Thu, 12 Apr 2007 14:55:49 +0100
- To: <www-international@w3.org>
So it seems the alternatives that John is suggesting are: Determined, and a language (for which a subtag exists): <subtag(s)> Determined, and not a language: zxx Determined, but not a language for which a subtag exists: ??? Undetermined, and not sure whether it is a language or not: xml:lang="" (if available) Undetermined, but sure that it's a language: und The implications of this for X/HTML are that there is no way to say that text is undetermined if you are not sure whether it's a language or not. This is very different from Jon Hanna's proposal at http://lists.w3.org/Archives/Public/www-international/2007JanMar/0178.html Can we please discuss this. I'm particularly hoping for contributions from John, Jon, Mark, Martin and Addison (though he's on vacation at the moment). For my part, having experienced, even when trying to write this email, how difficult it is to succinctly talk about the difference between something that is unidentified and may or may not be a language, I'm a little leery about accepting the evidence in the mail below, John. Can we be sure that the people who drafted that text were conciously making the distinction you mention rather than just being a little imprecise in wording? I'm also a little worried about the wording in section 4.1 of RFC 4646[1] about und, which quite clearly says that you shouldn't use und unless the *protocol* demands it, or sometimes when matching tags. This doesn't make any distinction between specifying the language of a resource and turning off language declarations for a range of embedded text. It seems that this suggests another way in which xml:lang='' and xml:lang="und" are not equivalent. In my opinion, either the text of RFC 4646 needs some work, either to relax the use of und in scenarios where undefined text occurs in a context that is defined, or to clarify the relationship of und to xml:lang=''. RI [1] http://www.rfc-editor.org/rfc/rfc4646.txt ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://people.w3.org/rishida/blog/ http://www.flickr.com/photos/ishida/ > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org] On Behalf Of John Cowan > Sent: 11 April 2007 21:24 > To: Mark Davis > Cc: John Cowan; CE Whitehead; www-international@w3.org > Subject: Re: For review: Tagging text with no language > > > Mark Davis scripsit: > > > I believe that that is adding an interpretation to "und" > which is not > > borne out by either the source standards, nor in common usage. > > ISO 639-2 says merely "Undetermined", but this is placed in a > column labeled "English name of language", so I think it's > fair to read it as "Undetermined language". But ISO 639-3 > is, I think, definitive. > http://www.sil.org/iso639-3/scope.asp#S says (in part): > > The identifier [und] (undetermined) is provided for those > situations in which a language or languages must be indicated > but the *language* cannot be identified [emphasis added]. > > By contrast, "zxx" is explained in the next sentence thus: > > The identifier [zxx] (no linguistic content) may be applied in a > situation in which a language identifier is required by system > definition, but the item being described does not actually > contain linguistic content. > > In any case, the document I'm commenting on says that "zxx" > is non-linguistic content, and that "und" and "" are > synonymous and represent linguistic content. Whatever "und" > may or may not mean, I think there's no doubt that "" can be > applied to both linguistic and non-linguistic content. > > -- > You escaped them by the will-death John Cowan > and the Way of the Black Wheel. cowan@ccil.org > I could not. --Great-Souled Sam > http://www.ccil.org/~cowan >
Received on Thursday, 12 April 2007 13:55:12 UTC