RE: UTF-16, UTF-16BE and UTF-16LE in HTML5 from Richard Ishida on 2010-07-27 (www-international@w3.org from July to September 2010)

From: Richard Ishida <ishida@w3.org>
Date: Tue, 27 Jul 2010 13:13:32 +0100
To: "'Henri Sivonen'" <hsivonen@iki.fi>
Cc: <public-html@w3.org>, <www-international@w3.org>
Message-ID: <042601cb2d85$221a7290$664f57b0$@org>

> From: Henri Sivonen [mailto:hsivonen@iki.fi]
> Sent: 27 July 2010 11:33
...
> > [2] i18n folks have long advised that you should always include a
> > visible
> > indication of the encoding in a document, HTML or XML, even if you
> > don't
> > strictly need to, because it can be very useful for developers,
> > testers, or
> > translation production managers who want to visually check the
> > encoding of a
> > document.
> 
> That's a bad rationale. It's a *very* bad idea to check the encoding by
> reading a string that doesn't participate in encoding detection at all, since the
> string may be wrong.

Well any encoding declaration may be wrong - participation in the encoding detection doesn't mean that the encoding of the document will actually be what the declaration says. So I don't think it makes much difference.  On the other hand, since actually getting your document into a utf-16 encoding is a little more complicated than using other encodings, it may be more often right - in which case it is extremely useful for people who visually inspect the document, given that they can't see the BOM and may otherwise assume that the encoding is not utf-16.

RI

Received on Tuesday, 27 July 2010 12:14:06 UTC