- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Tue, 09 May 2023 16:45:20 +0100
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
Received on Tuesday, 9 May 2023 15:48:42 UTC
> I note in passing that while we think that empirically the unexpected
> appearance of BOMs only occurs in UTF8 data streams, I think that our
> rule can be more general: if a BOM appears as the first character in
> any data stream, it is either definitely (in the case of an input
> grammar) or almost certainly (in the case of an input string) not
> intended as data and better ignored -- that holds true for any
> encoding including UTF-16 not just UTF-8. (It's Norm's action to draft
> this, not mine, so this is just a suggestion.)
I believe that the only way for a BOM to appear at the beginning of a
UTF-16 encoded string would be if the UTF-16 BOM was followed by
*another* U+FEFF character. In this case, I think it would be an error
to ignore it.
I think a processor is only licensed to ignore a BOM at the beginning of
an input string if it believes that the input is UTF-8 encoded.
Hopefully my proposed wording is clear (enough).
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Tuesday, 9 May 2023 15:48:42 UTC