- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 27 May 2007 11:56:29 +0300
"If the encoding is one of UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE, then authors can use a BOM at the start of the file to indicate the character encoding." That sentence should read: "If the encoding is one of UTF-8, UTF-16, or UTF-32, then authors can use a BOM at the start of the file to indicate the character encoding." The encoding labels with LE or BE in them mean BOMless variants where the encoding label on the transfer protocol level gives the endianness. See http://www.ietf.org/rfc/rfc2781.txt When the spec refers to UTF-16 with BOM in a particular endianness, I think the spec should use "big-endian UTF-16" and "little-endian UTF-16". Since declaring endianness on the transfer protocol level has no benefit over using the BOM when the label is right and there's a chance to get the label wrong, the encoding labels with explicit endianness are harmful for interchange. In my opinion, the spec should avoid giving authors any bad ideas by reinforcing these labels by repetition. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Sunday, 27 May 2007 01:56:29 UTC