Re: How do I say ‘this is not in any language’ in XHTML/HTML

Hi Richard,

I agree with your choice and have a comment below.

Richard Ishida wrote:
> This is an attempt to summarise and move forward some ideas in a thread on www-international@w3.org by Christophe Strobbe, Martin Duerst, Bjoern Hoermann and Tex Texin.
> http://lists.w3.org/Archives/Public/www-international/2005JulSep/0163.html
>
>
>
> You should always use the lang and/or xml:lang attributes in HTML or XHTML to identify the human language of the content so that applications such as voice browsers, style sheets, and the like can process that text. (See Declaring Language in XHTML and HTML[1] for the details.)
>
> You can override that language setting for a part of the document that is in a different language, eg. some French quotation in an English document, by using the same attribute(s) around the relevant bit of text.
>
> Suppose you have some text that is not in any language, such as type samples, part numbers, perhaps program code. How would you say that this was no language in particular?
>
> There are a number of possible approaches:
>
>    1. A few years ago we introduced into the XML spec the idea that xml:lang=”" conveys that ‘there is no language information available’. (See 2.12 Language Identification[2])
>
>    2. An alternative is to use the value ‘und’, for ‘undetermined’.
>
>    3. In the IANA Subtag Registry[3] there is another tag, ‘zxx’, that means ‘No linguistic content’. Perhaps this is a better choice. It has my vote at the moment.
>
>
>
> [xml:lang=""]
> Is ‘no language information available’ suitable to express ‘this is not a language’? My feeling is not.
>
> If it were appropriate, there are some other questions to be answered here. With HTML an empty string value for the lang or xml:lang attribute produces a validation error.
>
> It seems to me that the validator should not produce an error for xml:lang=”". It needs to be fixed.
>
> I’m not clear whether the HTML DTD supports an empty string value for lang. 

the XHTML DTD says

<!ENTITY % LanguageCode "NMTOKEN">
    <!-- a language code, as per [RFC3066] -->

so an empty value is not legal.


> If so, the presumably the validator needs to be fixed. If not, then this is not a viable option, since you’d really want both lang and xml:lang to have the same values.
>   

<!ENTITY % i18n
 "lang        %LanguageCode; #IMPLIED
  xml:lang    %LanguageCode; #IMPLIED
  dir         (ltr|rtl)      #IMPLIED"
  >

it seems that both lang and xml:lang have the same definition: no empty 
value possible.

I did not check the HTML DTD.

Felix

> [und]
> Would the description ‘undetermined’ fit this case, given that it is not a language at all? Again, it doesn’t seem right to me, since ‘undetermined’ seems to suggest that it is a language of some sort, but we’re not sure which.
>
> [zxx]
> This seems to be the right choice for me. It would produce no validation issues. The only issue is perhaps that it’s not terrible memorable.
>
> Thoughts?
>
> RI
>
>
> [1] http://www.w3.org/International/tutorials/language-decl/
>
> [2] http://www.w3.org/TR/REC-xml/#sec-lang-tag
>
> [3] http://www.iana.org/assignments/language-subtag-registry
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>  
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>  
>
>   

Received on Tuesday, 13 March 2007 19:28:06 UTC