W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

RE: For review: Tagging text with no language

From: Richard Ishida <ishida@w3.org>
Date: Thu, 12 Apr 2007 15:35:15 +0100
To: <www-international@w3.org>, "'LTRU Working Group'" <ltru@ietf.org>
Message-ID: <01a001c77d0f$c9c8ccf0$6601a8c0@rishida>

Ok, I just found some additional emails between John and Mark that had been
scooped into my LTRU folder by my mail client because [LTRU] was prepended
to the subject. They shift the baseline for discussion and agreement.  

What I now see as the summary of where we are is:

[a] Determined, and a language (for which a subtag exists): <subtag(s)>

[b] Determined, and not a language: zxx

[c] Determined, but not a language for which a subtag exists: ???

[d] Undetermined, and not sure whether it is a language or not: xml:lang=""
if available, otherwise und

I have revised the article at [1] to make it clearer that whether text is
linguistic or not is unimportant wrt use of '' and und.


I'm still troubled however by the passage in RFC 4646, however, so I'll
repeat those comments here and copy the LTRU folks:

I'm also a little worried about the wording in section 4.1 of RFC 4646[2]
about und, which quite clearly says that you shouldn't use und unless the
*protocol* demands it, or sometimes when matching tags.  This doesn't make
any distinction between specifying the language of a resource and turning
off language declarations for a range of embedded text.  It seems that this
suggests a way in which xml:lang='' and xml:lang="und" are not equivalent,
since there are no such restrictions on xml:lang="". In my opinion, the text
of RFC 4646 needs some work, both to relax the use of und in scenarios where
'undefined text' occurs in a context with a defined language, and to clarify
the relationship of und to xml:lang=''.

RI 

[1] http://www.w3.org/International/questions/qa-no-language#answer

[2] http://www.rfc-editor.org/rfc/rfc4646.txt

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 
 

> -----Original Message-----
> From: www-international-request@w3.org 
> [mailto:www-international-request@w3.org] On Behalf Of Richard Ishida
> Sent: 12 April 2007 14:56
> To: www-international@w3.org
> Subject: RE: For review: Tagging text with no language
> 
> 
> So it seems the alternatives that John is suggesting are:
> 
> Determined, and a language (for which a subtag exists): <subtag(s)>
> Determined, and not a language: zxx
> Determined, but not a language for which a subtag exists: ???
> 
> Undetermined, and not sure whether it is a language or not: 
> xml:lang="" (if
> available)
> Undetermined, but sure that it's a language: und
> 
> The implications of this for X/HTML are that there is no way 
> to say that
> text is undetermined if you are not sure whether it's a 
> language or not.
> 
> This is very different from Jon Hanna's proposal at
> http://lists.w3.org/Archives/Public/www-international/2007JanM
ar/0178.html
> 
> Can we please discuss this.  I'm particularly hoping for 
> contributions from
> John, Jon, Mark, Martin and Addison (though he's on vacation 
> at the moment).
> 
> For my part, having experienced, even when trying to write 
> this email, how
> difficult it is to succinctly talk about the difference 
> between something
> that is unidentified and may or may not be a language, I'm a 
> little leery
> about accepting the evidence in the mail below, John.  Can we 
> be sure that
> the people who drafted that text were conciously making the 
> distinction you
> mention rather than just being a little imprecise in wording?
> 
> I'm also a little worried about the wording in section 4.1 of 
> RFC 4646[1]
> about und, which quite clearly says that you shouldn't use 
> und unless the
> *protocol* demands it, or sometimes when matching tags.  This 
> doesn't make
> any distinction between specifying the language of a resource 
> and turning
> off language declarations for a range of embedded text.  It 
> seems that this
> suggests another way in which xml:lang='' and xml:lang="und" are not
> equivalent. In my opinion, either the text of RFC 4646 needs 
> some work,
> either to relax the use of und in scenarios where undefined 
> text occurs in a
> context that is defined, or to clarify the relationship of und to
> xml:lang=''.
> 
> RI
> 
> 
> [1] http://www.rfc-editor.org/rfc/rfc4646.txt
> 
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>  
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>  
>  
> 
> > -----Original Message-----
> > From: www-international-request@w3.org 
> > [mailto:www-international-request@w3.org] On Behalf Of John Cowan
> > Sent: 11 April 2007 21:24
> > To: Mark Davis
> > Cc: John Cowan; CE Whitehead; www-international@w3.org
> > Subject: Re: For review: Tagging text with no language
> > 
> > 
> > Mark Davis scripsit:
> > 
> > > I believe that that is adding an interpretation to "und" 
> > which is not 
> > > borne out by either the source standards, nor in common usage.
> > 
> > ISO 639-2 says merely "Undetermined", but this is placed in a 
> > column labeled "English name of language", so I think it's 
> > fair to read it as "Undetermined language".  But ISO 639-3 
> > is, I think, definitive.
> > http://www.sil.org/iso639-3/scope.asp#S says (in part):
> > 
> > 	The identifier [und] (undetermined) is provided for those
> > 	situations in which a language or languages must be indicated
> > 	but the *language* cannot be identified [emphasis added].
> > 
> > By contrast, "zxx" is explained in the next sentence thus:
> > 
> > 	The identifier [zxx] (no linguistic content) may be applied in a
> > 	situation in which a language identifier is required by system
> > 	definition, but the item being described does not actually
> > 	contain linguistic content.
> > 
> > In any case, the document I'm commenting on says that "zxx" 
> > is non-linguistic content, and that "und" and "" are 
> > synonymous and represent linguistic content.  Whatever "und" 
> > may or may not mean, I think there's no doubt that "" can be 
> > applied to both linguistic and non-linguistic content.
> > 
> > -- 
> > You escaped them by the will-death              John Cowan
> > and the Way of the Black Wheel.                 cowan@ccil.org
> > I could not.  --Great-Souled Sam                
> > http://www.ccil.org/~cowan
> > 
> 
> 
Received on Thursday, 12 April 2007 14:34:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT