Re: For review: The byte-order mark (BOM) in HTML from Bjoern Hoehrmann on 2013-01-02 (www-international@w3.org from January to March 2013)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 02 Jan 2013 03:34:53 +0100
To: Richard Ishida <ishida@w3.org>
Cc: www International <www-international@w3.org>
Message-ID: <8j67e899jcnv3tn0icbg1dv9td6l27fla9@hive.bjoern.hoehrmann.de>

* Richard Ishida wrote:
>http://www.w3.org/International/questions/new/qa-byte-order-mark-new.en.php

It says:

  You need to be careful to take the BOM into account when using
  scripting to automatically process files that start with a BOM.
  For example, when pattern matching at the start of a file that
  begins with a BOM you need additional code to test for the
  presence of the BOM and ignore it if found.

I do not see why this is under "Scripting" considering it affects text
processing regardless of distinctions between "script" and "compiled"
languages, but more importantly, the issue is more complex than that.

If your own code is responsible for detecing the Unicode signature, it
would seem not worth mentioning that it should not be treated as text.
And if lower-level code, like an IO library, detects the signature, it
would be unlikely to pass it to your code as text, in which case you do
not need to, and in fact should not do anything.

So this seems to be a confusing way to say that when an initial octet
sequence has been identified as a Unicode signature, the octets should
not be interpreted as text in later processing stages. That would be
good to note, but not under a "Scripting" heading.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Wednesday, 2 January 2013 02:35:22 UTC