RE: Specifying the language of content

Hi Norbert,

Many thanks for sending in these comments. See my responses below...


============
Richard Ishida
W3C

contact info:
http://www.w3.org/People/Ishida/ 

W3C Internationalization:
http://www.w3.org/International/ 

Publication blog:
http://people.w3.org/rishida/blog/
 
 

> -----Original Message-----
> From: www-i18n-comments-request@w3.org 
> [mailto:www-i18n-comments-request@w3.org] On Behalf Of 
> Norbert Lindenberg
> Sent: 03 August 2004 02:07
> To: www-i18n-comments@w3.org
> Cc: Norbert Lindenberg
> Subject: Specifying the language of content
> 
> 
> Dear Richard,
> 
> I came across the document "Authoring Techniques for XHTML & HTML
> Internationalization: Specifying the language of content 1.0" 
> at http://www.w3.org/TR/i18n-html-tech-lang/. While this is a 
> very useful document overall, I have a few comments:

( Note, btw, that there is a more up to date version in edit at
http://www.w3.org/International/geo/html-tech/tech-lang.html )

> 
> 1) It would be good to provide some examples for how user 
> agents use the language information. There are two examples 
> mentioned in the abstract, but it seems to me that the most 
> common use of language information on the web currently is 
> for font selection for documents sent as UTF-8. Or what other 
> use do the user agents listed in section
> 1.4 make of language information?

The latest version says: "For a discussion of why it is important to declare
the language of a document see the article Why use the language
attribute?[1] and the beginning of the tutorial Using Language Information
in XHTML, HTML and CSS[2]."
[1] http://www.w3.org/International/questions/qa-lang-why
[2] http://www.w3.org/International/tutorials/tutorial-lang/#declaring
These provide additional information. 

> 
> 2) Item http://www.w3.org/TR/i18n-html-tech-lang/#ri20040429.094220724
> says "Do not use the meta tag to declare the language of a document." 
> The justification is that "tag is not widely recognized by 
> current user agents." While I agree that using the meta tag 
> alone is insufficient, I don't see any problem with using it 
> in addition to the lang attributes. 
> The meta tag makes the information available in the HTTP 
> header, and in some cases that's all a user agent gets to see 
> (e.g., when making an HTTP HEAD request). What's wrong with that?

This has been substantially elaborated since that version was written. See
http://www.w3.org/International/geo/html-tech/tech-lang.html#ri20040429.0942
20724 and the documents it points to.

Btw, my research showed no browser that actually uses the meta tag
information, although I have an unconfirmed suspicion that search engines
and the like may. There is only patchy use made of the HTTP header
information.


> 
> 3) Item http://www.w3.org/TR/i18n-html-tech-lang/#ri20040429.113217290
> says "Use the codes zh-Hans and zh-Hant to refer to 
> Simplified and Traditional Chinese, respectively." These 
> attribute values don't seem to have the desired effect on 
> font selection. In my testing with several browsers, running 
> in English environments but with full CJK support installed, 
> I have not found a single browser that recognizes the script 
> codes. The behavior I see is:
>    - Internet Explorer 6.0 / Windows: ignore "zh-Hans" and "zh-Hant" 
> entirely (i.e., use Japanese font)
>    - Mozilla 1.5.1 Mac, Firefox 0.9 Mac: use simplified 
> Chinese font for both "zh-Hans" and "zh-Hant".
>    - Explorer 5.2.3 Mac: use traditional Chinese font for 
> both "zh-Hans" 
> and "zh-Hant".
>    - Safari 1.2.2 / Mac: lang attribute doesn't affect font selection.
> 
> I can see why zh-Hans and zh-Hant are better in theory, but 
> if they don't work, they shouldn't be recommended.

Yes, it's a bit of a chicken and egg situation. I agree that we should
probably soften the directive. (Note that there is a bug request in progress
to fix this for Mozilla.) 

I'm not sure how complicated we should make this. It seems to me that there
are situations where use of zh-Hans/Hant is not problematic at all, and
others (such as choice of fonts for utf-8 text) where it currently doesn't
work, but one would hope that it would soon.

We do mention the need for caution, but could perhaps express it more
clearly. 

The problem is that if no-one ever uses the new approach, UA developers
won't feel motivated to change it.


RI


> 
> Best regards,
> Norbert
> 

Received on Wednesday, 4 August 2004 09:29:18 UTC