W3C home > Mailing lists > Public > public-i18n-geo@w3.org > November 2003

Re: New FAQ: Removing UTF-8 BOM

From: Jungshik Shin <jshin@i18nl10n.com>
Date: Thu, 6 Nov 2003 00:41:39 +0900 (KST)
To: Tex Texin <tex@i18nguy.com>
Cc: public-i18n-geo@w3.org
Message-ID: <Pine.LNX.4.58.0311060007180.12721@jshin.net>

On Wed, 5 Nov 2003, Tex Texin wrote:

> 2) yes, the characters will display differently, depending on encoding and font
> of the editor.
> Maybe we should use a graphic to show the mistreatment(s).
> Also, they are being mistreated as characters, but we should refer to them as
> bytes since they are not representing characters.

 Yeah, it may be a good idea to give a couple (or a few) images with
misidentified encodings.


> 3) For the faq we shouldn't use scripts that look "something like..." or have
> too many version dependencies. So we can't use the sed script.

 I should have given a complete version, but I guess not many people
would be interested in it. So, let's forget about it as you suggested.


> If it is not safe and reliable we shouldn't put it in the faq at all.

  With Perl 5.8 already widely deployed, the following
is  safe and reliable. I wrote about Perl 5.6 (or earlier) just for the
sake of completeness.

  perl -p -i.bak -e '(1 == $.) && s/^\x{FEFF}//' filename

With '-i.bak' instead of '-i', the original will be backed up
as filename.bak (Martin used '-i~'.)

   Jungshik
Received on Wednesday, 5 November 2003 10:44:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:38 GMT