Re: New FAQ: Removing UTF-8 BOM from Jungshik Shin on 2003-11-05 (public-i18n-geo@w3.org from November 2003)

From: Jungshik Shin <jshin@i18nl10n.com>
Date: Thu, 6 Nov 2003 00:41:39 +0900 (KST)
To: Tex Texin <tex@i18nguy.com>
Cc: public-i18n-geo@w3.org
Message-ID: <Pine.LNX.4.58.0311060007180.12721@jshin.net>

On Wed, 5 Nov 2003, Tex Texin wrote:

> 2) yes, the characters will display differently, depending on encoding and font
> of the editor.
> Maybe we should use a graphic to show the mistreatment(s).
> Also, they are being mistreated as characters, but we should refer to them as
> bytes since they are not representing characters.

 Yeah, it may be a good idea to give a couple (or a few) images with
misidentified encodings.

> 3) For the faq we shouldn't use scripts that look "something like..." or have
> too many version dependencies. So we can't use the sed script.

 I should have given a complete version, but I guess not many people
would be interested in it. So, let's forget about it as you suggested.

> If it is not safe and reliable we shouldn't put it in the faq at all.

  With Perl 5.8 already widely deployed, the following
is  safe and reliable. I wrote about Perl 5.6 (or earlier) just for the
sake of completeness.

  perl -p -i.bak -e '(1 == $.) && s/^\x{FEFF}//' filename

With '-i.bak' instead of '-i', the original will be backed up
as filename.bak (Martin used '-i~'.)

   Jungshik

Received on Wednesday, 5 November 2003 10:44:54 UTC