W3C home > Mailing lists > Public > public-i18n-core@w3.org > October to December 2010

RE: i18n Polyglot Markup/in-doc encoding declarations (2nd issue)

From: Richard Ishida <ishida@w3.org>
Date: Fri, 8 Oct 2010 18:51:41 +0100
To: "'Eliot Graff'" <eliotgra@microsoft.com>, "'Leif Halvard Silli'" <xn--mlform-iua@xn--mlform-iua.no>
Cc: <public-html@w3.org>, <public-i18n-core@w3.org>
Message-ID: <00cb01cb6711$7738b0a0$65aa11e0$@org>
There is another article Declaring character encodings in HTML
http://www.w3.org/International/questions/qa-html-encoding-declarations ,
but there isn't currently a W3C Note or other rec track document that can be


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)


> -----Original Message-----
> From: Eliot Graff [mailto:eliotgra@microsoft.com]
> Sent: 01 October 2010 16:12
> To: Leif Halvard Silli; Richard Ishida
> Cc: public-html@w3.org; public-i18n-core@w3.org
> Subject: RE: i18n Polyglot Markup/in-doc encoding declarations (2nd issue)
> Importance: High
> The Editor's Draft of 29 September contains the following edit, and I have
> therefore resolved bug 10150 as fixed.
> ]]
> Note that the W3C Internationalization (i18n) Group recommends to always
> include a visible encoding declaration in a document, because it helps
> developers, testers, or translation production managers to check the
> encoding of a document visually.
> [[
> I would like to link to a resource for this statement, though. Can you
> recommend one that's better than the i18n article, "Character encodings"
> Thanks,
> Eliot
> [1] http://www.w3.org/International/O-charset
> -----Original Message-----
> From: Leif Halvard Silli [mailto:xn--mlform-iua@målform.no]
> Sent: Thursday, July 15, 2010 1:56 PM
> To: Richard Ishida
> Cc: public-html@w3.org; public-i18n-core@w3.org; Eliot Graff
> Subject: i18n Polyglot Markup/in-doc encoding declarations (2nd issue)
> I resend my comments, on request from Richard, with on issue per message.
> This is about the 2nd issue on the i18n group's tracking page:
> http://www.w3.org/International/reviews/1007-polyglot/
> 	Excerpt of the 2nd issue:
> 		]] In-document declarations always useful [...] So it's true
> say that you strictly don't need it, but we would prefer that people do.
> Please could you reflect that in your document. [[
> 	Comment: I have long since filed bug 9962 which says that only UTF-8
> and UTF-16 should be permitted. (No other  encodings should be allowed, as
> there are no HTML5-compatible way to  specify them.) And also, there is an
> on-going debate to limit the encodings to only UTF-8 - see Sam's message
> and the replies [1]. In the following, I'll assume that only
> UTF-8 and UTF-16 are relevant.
> For UTF-16, there is no HTML5-compatible way to have an in-document
> UTF-16 declaration. At least not as of yet. The i18n group can file a bug
> against HTML5 to make it valid, of course. Until that day, then your 2nd
> is not relevant w.r.t. UTF-16.
> When it comes to UTF-8, then in-document declaration is _necessary_,
> you want to rely on HTTP or BOM. Without BOM, HTTP or meta@charset, the
> HTML parser will most likely default to WIN-1252 or another locale
> dependent 8 bit encoding - at least in off-line parsing and other
> contexts.  Thus I tend to have the opinion that in-document declaration is
> requirement for UTF-8.
> [1] http://www.w3.org/mid/4C3F56AB.7030105@intertwingly.net
> --
> leif halvard silli
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.856 / Virus Database: 271.1.1/3159 - Release Date: 09/30/10
> 19:34:00
Received on Friday, 8 October 2010 17:52:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 8 October 2010 17:52:15 GMT