Re: BOM (several messages about handling encodings in HTML) from Geoffrey Sneddon on 2008-02-29 (www-archive@w3.org from February 2008)

From: Geoffrey Sneddon <foolistbar@googlemail.com>
Date: Fri, 29 Feb 2008 16:34:58 +0000
To: Brian Smith <brian@briansmith.org>
Cc: www-archive@w3.org
Message-Id: <7AAF27D3-C84F-4B14-942B-1858CB6AE955@googlemail.com>

On 29 Feb 2008, at 13:38, Brian Smith wrote:

>
> Ian Hickson wrote:
>>> However, when the encoding is UTF-16LE or UTF-16BE (i.e.
>>> supposed to be signatureless), do we really want to drop
>>> the BOM silently? Shouldn't it count as a character that
>>> is in error?
>>
>> Do the UTF-16LE and UTF-16BE specs make a leading BOM an error?
>>
>> If yes, then we don't have to say anything, it's already an error.
>>
>> If not, what's the advantage of complaining about the BOM in
>> this case?
>
> See http://unicode.org/faq/utf_bom.html#28:
>
> "In particular, whenever a data stream is declared to be UTF-16BE,  
> UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used."
>
> If somebody wants to include a zero-width non-breaking space  
> (ZWNBSP) at the beginning of a stream, they have to use U+2060 WORD  
> JOINER instead.

Could you possibly give me a pointer to something in the Unicode  
standard that requires that? I've never seen such a requirement.


--
Geoffrey Sneddon
<http://gsnedders.com/>

Received on Friday, 29 February 2008 16:35:19 UTC