- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 02 Jan 2013 03:34:53 +0100
- To: Richard Ishida <ishida@w3.org>
- Cc: www International <www-international@w3.org>
* Richard Ishida wrote: >http://www.w3.org/International/questions/new/qa-byte-order-mark-new.en.php It says: You need to be careful to take the BOM into account when using scripting to automatically process files that start with a BOM. For example, when pattern matching at the start of a file that begins with a BOM you need additional code to test for the presence of the BOM and ignore it if found. I do not see why this is under "Scripting" considering it affects text processing regardless of distinctions between "script" and "compiled" languages, but more importantly, the issue is more complex than that. If your own code is responsible for detecing the Unicode signature, it would seem not worth mentioning that it should not be treated as text. And if lower-level code, like an IO library, detects the signature, it would be unlikely to pass it to your code as text, in which case you do not need to, and in fact should not do anything. So this seems to be a confusing way to say that when an initial octet sequence has been identified as a Unicode signature, the octets should not be interpreted as text in later processing stages. That would be good to note, but not under a "Scripting" heading. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 2 January 2013 02:35:22 UTC