W3C home > Mailing lists > Public > public-i18n-geo@w3.org > November 2003

Re: Opening files in Notepad with/without UTF-8 signature

From: Tex Texin <tex@i18nguy.com>
Date: Thu, 06 Nov 2003 05:30:53 -0500
Message-ID: <3FAA22DD.76A1BE55@i18nguy.com>
To: ishida@w3.org
Cc: public-i18n-geo@w3.org

Richard Ishida wrote:
> I have NTFS but Notepad is only able to detect that a file is UTF-8
> before opening with the Open dialog box if the signature is present.
> Remove the signature and it no longer knows.  So it seems to me that
> NTFS doesn't remember the encoding.  Same happens for html files with
> encoding declaration.

possibly. We were speculating it also depended on how the file was created.
But as you say it was also doing some detecting based on heuristics, it is
going to depend on the data in the file.
Which means our studies with a handful of files are not conclusive.
Apparently it also depends on how you open the file since you say above the box
looks for the signature, but below for right click its always correct. Odd they
don't use the same detection for both.

At least we are together on the conclusion! ;-)

> > And if they use right-click, they won't even get the choice.
> Note that Notepad seems to apply some heuristics to the file when you
> right click.  It always opens correctly as utf-8.
> > The solution to the FAQ is just to include a sentence or two
> > indicating that if you are going to remove a BOM, you should
> > know how the file is used and verify whether it will have an
> > impact, (or remove it and monitor that subsequent processing
> > doesn't break).
> That sounds fair enough.
> RI

Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
Received on Thursday, 6 November 2003 05:31:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:00 UTC