Re: New FAQ: Removing UTF-8 BOM

On Wed, 5 Nov 2003, Tex Texin wrote:

> 2) yes, the characters will display differently, depending on encoding and font
> of the editor.
> Maybe we should use a graphic to show the mistreatment(s).
> Also, they are being mistreated as characters, but we should refer to them as
> bytes since they are not representing characters.

 Yeah, it may be a good idea to give a couple (or a few) images with
misidentified encodings.


> 3) For the faq we shouldn't use scripts that look "something like..." or have
> too many version dependencies. So we can't use the sed script.

 I should have given a complete version, but I guess not many people
would be interested in it. So, let's forget about it as you suggested.


> If it is not safe and reliable we shouldn't put it in the faq at all.

  With Perl 5.8 already widely deployed, the following
is  safe and reliable. I wrote about Perl 5.6 (or earlier) just for the
sake of completeness.

  perl -p -i.bak -e '(1 == $.) && s/^\x{FEFF}//' filename

With '-i.bak' instead of '-i', the original will be backed up
as filename.bak (Martin used '-i~'.)

   Jungshik

Received on Wednesday, 5 November 2003 10:44:54 UTC