- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Tue, 09 May 2023 16:45:20 +0100
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
Received on Tuesday, 9 May 2023 15:48:42 UTC
> I note in passing that while we think that empirically the unexpected > appearance of BOMs only occurs in UTF8 data streams, I think that our > rule can be more general: if a BOM appears as the first character in > any data stream, it is either definitely (in the case of an input > grammar) or almost certainly (in the case of an input string) not > intended as data and better ignored -- that holds true for any > encoding including UTF-16 not just UTF-8. (It's Norm's action to draft > this, not mine, so this is just a suggestion.) I believe that the only way for a BOM to appear at the beginning of a UTF-16 encoded string would be if the UTF-16 BOM was followed by *another* U+FEFF character. In this case, I think it would be an error to ignore it. I think a processor is only licensed to ignore a BOM at the beginning of an input string if it believes that the input is UTF-8 encoded. Hopefully my proposed wording is clear (enough). Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Tuesday, 9 May 2023 15:48:42 UTC