W3C home > Mailing lists > Public > public-i18n-geo@w3.org > November 2003

RE: Opening files in Notepad with/without UTF-8 signature

From: Richard Ishida <ishida@w3.org>
Date: Thu, 6 Nov 2003 07:44:11 -0000
To: "'Tex Texin'" <tex@i18nguy.com>
Cc: <public-i18n-geo@w3.org>
Message-ID: <000101c3a439$c7d286a0$6501a8c0@w3c40upc3ma3j2>

> With respect to Notepad, one of the considerations we 
> discussed is that Windows has 3 file systems. One of them, 
> NTFS, maintains additional information about files. We 
> speculated it recorded encoding. So, if notepad made use of 
> that, the BOM might be less necessary in that environment. 
> (Which I suspect is also your environment, as NTFS is the 
> default for hard disks.)  

I have NTFS but Notepad is only able to detect that a file is UTF-8
before opening with the Open dialog box if the signature is present.
Remove the signature and it no longer knows.  So it seems to me that
NTFS doesn't remember the encoding.  Same happens for html files with
encoding declaration. 

> And if they use right-click, they won't even get the choice.

Note that Notepad seems to apply some heuristics to the file when you
right click.  It always opens correctly as utf-8.

> The solution to the FAQ is just to include a sentence or two 
> indicating that if you are going to remove a BOM, you should 
> know how the file is used and verify whether it will have an 
> impact, (or remove it and monitor that subsequent processing 
> doesn't break).

That sounds fair enough.

Received on Thursday, 6 November 2003 02:44:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:00 UTC