W3C home > Mailing lists > Public > public-i18n-geo@w3.org > November 2003

Re: New FAQ: Removing UTF-8 BOM

From: Martin Duerst <duerst@w3.org>
Date: Thu, 06 Nov 2003 01:40:56 -0500
Message-Id: <4.2.0.58.J.20031105224713.05971750@localhost>
To: Jungshik Shin <jshin@i18nl10n.com>
Cc: public-i18n-geo@w3.org

At 23:39 03/11/05 +0900, Jungshik Shin wrote:

>On Wed, 5 Nov 2003, Martin Duerst wrote:
>
> > It can even be typed directly, as:
> >
> > prompt>  perl -pi~ -0777 -e "s/^\xEF\xBB\xBF//s;" filewithbom.html
>
>   Well, this doesn't work with Perl 5.6 or later because in Perl 5.6
>or later, the native representation of characters is UTF-8.

It would very much surprise me if there were no way to say
inside a perl program that input and output should be treated
as binary.


>Even in
>earlier Perl, it has a problem of removing U+FEFF at places other than
>the very beginning of files.

No, that's what the -0777 option is for, which makes the
whole file being treated as a single line.


Regards,   Martin.
Received on Thursday, 6 November 2003 06:57:00 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:00 UTC