W3C home > Mailing lists > Public > www-international@w3.org > July to September 2011

Re: HTML <head> article updated

From: Richard Ishida <ishida@w3.org>
Date: Wed, 10 Aug 2011 13:31:36 +0100
Message-ID: <4E427A28.7010205@w3.org>
To: Chris Mills <cmills@opera.com>
CC: "'public-evangelist@w3.org' w3. org" <public-evangelist@w3.org>, www International <www-international@w3.org>
Hello Chris,

[cc www-international so that they know i have sent feedback, and in 
case others wish to comment]

Here's some feedback on http://www.w3.org/wiki/The_HTML_head_element

"The language codes may be two-letter codes, such as en for English, 
four-letter codes such as en-US for American English, or other, less 
common, codes. The two-letter codes are defined in ISO 639-1, although 
modern best practice dictates that you should use the IANA subtag 
registry for your language code definitions."

I think this paragraph needs a fair bit of attention.

[1] language codes => language tags  (for consistency and clarity - 
codes was used in the past to refer to ISO language codes or region 
codes, but something like en-US is two such codes (though only one 
language tag)).  (btw, en and US are both 'subtags' - be careful not to 
mix tags with subtags)

[2] language subtags can be 2 or 3 letters, region subtags can be 2 or 3 
alphanum characters, so the opening part of the paragraph is quite 

[3] i strongly urge to not refer people to ISO 639 - they should use the 
IANA registry to look things up (and you may want to point to 
http://rishida.net/utils/subtags/ which makes lookup a little more user 

[4] 'modern best practice': well actually its in the standards, so it's 
a little more than best practice

[5] it may be better for this audience to link to 
rather than  http://www.w3.org/International/articles/language-tags/

"Don't worry too much about this for now. utf-8 is the universal 
character set, which includes pretty much any character that you might 
want to use on a web page, from any common human language, so it is a 
good idea to declare this to make sure you HTML has full international 
capabilities. In addition, you can avoid a serious Internet Explorer 
security risk by declaring it in the first 512 bytes of the page. So 
just below the <head> tag is fine. This is what all the below examples 
will do."

[6] actually they need to worry about it at least enough to ensure that 
they are actually *saving their document* as UTF-8, not just changing 
the encoding declaration - otherwise, a doc saved as iso-8859-1 for 
example will fail to display properly when it comes to accented 
characters. They also need to be aware that the server may be overriding 
their declaration.

I recommend that you step back a little in the wiki, add a brief 
description of what an encoding is and why it's important, and add some 
text to say that authors should ensure that their editor *saves the 
text* in utf-8, but, if not, they should ensure that the charset 
attribute should indicate what the actual encoding used is.  We have 
some articles that can help people understand these concepts at

Hope that helps,

On 04/08/2011 16:56, Chris Mills wrote:
> UPDATE - 4th August 2011: I've updated http://www.w3.org/wiki/The_HTML_head_element to clean up language, add new HTML5 features, and add in a new section about doctypes, to replace Choosing the right doctype for your HTML documents (http://www.w3.org/wiki/Choosing_the_right_doctype_for_your_HTML_documents). The original article was a bit long winded, and needed a lot of updates to account for new thinking about doctypes, HTML5 doctype, etc.
> this is ready for proofing/translation now.
> QUESTION - should this big new doctype section be put into a new article? Does it make the article a bit too long?
> --
> Chris Mills
> Open standards evangelist and dev.opera.com editor
> Opera Software
> * Try our browsers: http://www.opera.com
> * Learn to build a better web, with the Opera web standards curriculum: http://www.opera.com/wsc
> * Learn about the latest open standards technologies and techniques: http://dev.opera.com

Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)


Register for the W3C MultilingualWeb Workshop!
Limerick, 21-22 September 2011
Received on Wednesday, 10 August 2011 12:31:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:04:30 UTC