[Bug 13396] i18n-ISSUE-77: HTTP and defaulting to UTF-16LE


Martin Dürst <duerst@it.aoyama.ac.jp> changed:

           What    |Removed                     |Added
                 CC|                            |duerst@it.aoyama.ac.jp

--- Comment #2 from Martin Dürst <duerst@it.aoyama.ac.jp> 2011-07-29 02:02:56 UTC ---
(In reply to comment #1)
> (In reply to comment #0)
> > Character encodings
> > http://www.w3.org/TR/html5/parsing.html#character-encodings-0
> > 
> > Supported by the i18n WG.
> > 
> > "When a user agent is to use the UTF-16 encoding but no BOM has been found,
> > user agents must default to UTF-16LE."
> > 
> > If the HTTP header declares the file to be UTF-16BE, which I believe it can,
> > and in which case a BOM should *not* be used, then I think that this would not
> > be true.
> Then the user agent isn't to use the UTF-16 encoding but the UTF-16BE encoding.
> The quoted sentence shouldn't say "UTF-16LE". It should say "little-endian
> UTF-16", unless the spec intends the reported encoding for the document to
> change and I'm pretty sure that's not the intention.

This would definitely make things clearer. I'd also suggest to change "is to
use the UTF-16 encoding" at the start of the sentence to something that makes
it clearer that this is stuff *labeled* with a label of "UTF-16" (using
explicit quotes).

> > If the HTTP header declares the file to be UTF-16, then there must be
> > a BOM, so I assume that this is a recovery mechanism if someone does declare
> > UTF-16 in HTTP but omits the BOM.
> Yes.

It may help to make this clear in the text.

Regards,   Martin.

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug.

Received on Friday, 29 July 2011 02:02:59 UTC