W3C home > Mailing lists > Public > www-international@w3.org > January to March 2007

Re: How do I say 'this is not in any language' in XHTML/HTML

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Sat, 24 Mar 2007 17:36:23 +0000
Message-ID: <46056197.8050808@emi.ac.ma>
To: Richard Ishida <ishida@w3.org>
Cc: 'Jon Hanna' <jon@hackcraft.net>, www-international@w3.org

Richard Ishida wrote:
> So I drafted an updated (largely rewritten) version at http://esw.w3.org/topic/geoNoLanguageTag
>
> Am I getting close to the answer now?
>   

Your question:
"How do I mark up HTML or XML content for language when I don't know the 
language, or the content is non-linguistic?"

I think "not applicable" is more accurate than "content non-linguistic". 
The latter is included into the former. For example,  for a picture or a 
part number, 'language' is obviously not applicable since there is no 
linguistic content, but for a program code it is easy to think of 'human 
language' not applicable  rather than no linguistic content.
Later you say 'perhaps program code' when talking about 'some text that 
is not in any language'

I agree with your classification
- lang = "und" when the language of text is actually unknown (It is in 
some language, but I can't say which it is) and
- lang = "zxx" (or "") when language is "not applicable" (It is a 
picture, a part number, a program code etc.)

But I put lang="" as semantically equivalent to lang="xzz", i.e. meaning 
no possible value because not applicable.

Najib

> RI
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>  
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>  
>  
>
>   
>> -----Original Message-----
>> From: Jon Hanna [mailto:jon@hackcraft.net] 
>> Sent: 22 March 2007 14:39
>> To: Richard Ishida
>> Cc: www-international@w3.org
>> Subject: Re: How do I say 'this is not in any language' in XHTML/HTML
>>
>> Richard Ishida wrote:
>>     
>>> I'm still not clear about the distinction between 
>>>       
>> xml:lang="" and xml:lang="und".  Any suggestions?
>>
>> If xml:lang is spec'd in a particular schema to allow an 
>> empty string then xml:lang="und" is a bug and xml:lang="" is not.
>>
>> If it is not spec'd to allow an empty string then 
>> xml:lang="und" is not a bug and xml:lang="" is!
>>
>> RFC 4646, like RFC 3066 before it expliclty states that und 
>> SHOULD not be used unless a protocol forces one to state a 
>> language tag. Since xml:lang does not force any use and is 
>> specified as stating that the empty string is allowed unless 
>> another specification (e.g. XHTML1.0) says otherwise.
>>
>> RFC 4646, again lke RFC 3066 before it, states that the lack 
>> of a language code means Undetermined (just as und does in a 
>> protocol that doesn't allow an empty language code).
>>
>> I agree with those who consider XHTML1.0 not allowing an 
>> empty xml:lang attribute value as obsolete (or an error? Did 
>> the first edition of the XML1.0 spec prohibit empty xml:lang?).
>>
>> Both of these cover cases where the language is not known. If it is
>> *known* that content does not contain any linguistic data 
>> then xml:lang="zxx" should be used.
>>
>>     
-- 

Najib TOUNSI (mailto:tounsi @ w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 50 (P1711)  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30 
Received on Saturday, 24 March 2007 17:36:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:10 GMT