RE: For review: Tagging text with no language

So it seems the alternatives that John is suggesting are:

Determined, and a language (for which a subtag exists): <subtag(s)>
Determined, and not a language: zxx
Determined, but not a language for which a subtag exists: ???

Undetermined, and not sure whether it is a language or not: xml:lang="" (if
available)
Undetermined, but sure that it's a language: und

The implications of this for X/HTML are that there is no way to say that
text is undetermined if you are not sure whether it's a language or not.

This is very different from Jon Hanna's proposal at
http://lists.w3.org/Archives/Public/www-international/2007JanMar/0178.html

Can we please discuss this.  I'm particularly hoping for contributions from
John, Jon, Mark, Martin and Addison (though he's on vacation at the moment).

For my part, having experienced, even when trying to write this email, how
difficult it is to succinctly talk about the difference between something
that is unidentified and may or may not be a language, I'm a little leery
about accepting the evidence in the mail below, John.  Can we be sure that
the people who drafted that text were conciously making the distinction you
mention rather than just being a little imprecise in wording?

I'm also a little worried about the wording in section 4.1 of RFC 4646[1]
about und, which quite clearly says that you shouldn't use und unless the
*protocol* demands it, or sometimes when matching tags.  This doesn't make
any distinction between specifying the language of a resource and turning
off language declarations for a range of embedded text.  It seems that this
suggests another way in which xml:lang='' and xml:lang="und" are not
equivalent. In my opinion, either the text of RFC 4646 needs some work,
either to relax the use of und in scenarios where undefined text occurs in a
context that is defined, or to clarify the relationship of und to
xml:lang=''.

RI


[1] http://www.rfc-editor.org/rfc/rfc4646.txt

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 
 

> -----Original Message-----
> From: www-international-request@w3.org 
> [mailto:www-international-request@w3.org] On Behalf Of John Cowan
> Sent: 11 April 2007 21:24
> To: Mark Davis
> Cc: John Cowan; CE Whitehead; www-international@w3.org
> Subject: Re: For review: Tagging text with no language
> 
> 
> Mark Davis scripsit:
> 
> > I believe that that is adding an interpretation to "und" 
> which is not 
> > borne out by either the source standards, nor in common usage.
> 
> ISO 639-2 says merely "Undetermined", but this is placed in a 
> column labeled "English name of language", so I think it's 
> fair to read it as "Undetermined language".  But ISO 639-3 
> is, I think, definitive.
> http://www.sil.org/iso639-3/scope.asp#S says (in part):
> 
> 	The identifier [und] (undetermined) is provided for those
> 	situations in which a language or languages must be indicated
> 	but the *language* cannot be identified [emphasis added].
> 
> By contrast, "zxx" is explained in the next sentence thus:
> 
> 	The identifier [zxx] (no linguistic content) may be applied in a
> 	situation in which a language identifier is required by system
> 	definition, but the item being described does not actually
> 	contain linguistic content.
> 
> In any case, the document I'm commenting on says that "zxx" 
> is non-linguistic content, and that "und" and "" are 
> synonymous and represent linguistic content.  Whatever "und" 
> may or may not mean, I think there's no doubt that "" can be 
> applied to both linguistic and non-linguistic content.
> 
> -- 
> You escaped them by the will-death              John Cowan
> and the Way of the Black Wheel.                 cowan@ccil.org
> I could not.  --Great-Souled Sam                
> http://www.ccil.org/~cowan
> 

Received on Thursday, 12 April 2007 13:55:12 UTC