W3C home > Mailing lists > Public > public-html@w3.org > July 2010

i18n Polyglot Markup/in-doc encoding declarations (2nd issue)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 16 Jul 2010 00:55:53 +0400
To: Richard Ishida <ishida@w3.org>
Cc: public-html@w3.org, public-i18n-core@w3.org, Eliot Graff <eliotgra@microsoft.com>
Message-ID: <20100716005553607288.165be8e0@xn--mlform-iua.no>
I resend my comments, on request from Richard, with on issue per 
message. This is about the 2nd issue on the i18n group's tracking page: 

	Excerpt of the 2nd issue: 

		]] In-document declarations always useful [...] So it's true to say 
that you strictly don't need it, but we would prefer that people do. 
Please could you reflect that in your document. [[

	Comment: I have long since filed bug 9962 which says that only UTF-8 
and UTF-16 should be permitted. (No other  encodings should be allowed, 
as there are no HTML5-compatible way to  specify them.) And also, there 
is an on-going debate to limit the encodings to only UTF-8 - see Sam's 
message and the replies [1]. In the following, I'll assume that only 
UTF-8 and UTF-16 are relevant.

For UTF-16, there is no HTML5-compatible way to have an in-document 
UTF-16 declaration. At least not as of yet. The i18n group can file a 
bug against HTML5 to make it valid, of course. Until that day, then 
your 2nd issue is not relevant w.r.t. UTF-16.

When it comes to UTF-8, then in-document declaration is _necessary_, 
unless you want to rely on HTTP or BOM. Without BOM, HTTP or 
meta@charset, the HTML parser will most likely default to WIN-1252 or 
another locale dependent 8 bit encoding - at least in off-line parsing 
and other uncontrolled contexts.  Thus I tend to have the opinion that 
in-document declaration is a requirement for UTF-8.

[1] http://www.w3.org/mid/4C3F56AB.7030105@intertwingly.net
leif halvard silli
Received on Thursday, 15 July 2010 21:00:11 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:21 UTC