RE: BOM & Unicode editors

Hi,

	Thanks for your response.

	Can you point me to more information on heuristics I can use to
detect a UTF-16 or UTF-8 file. The Microsoft editors I used saved the file
as actual Unicode values (2 byte values). Although I am not familiar with
UTF16 encoding, I assume it results in a different sequence than actual 2
byte Unicode values. So could you also help me identify pure unicode data
too?

	Where can I find more info. on byte order detection in the absence
of the BOM.

-Saba


> -----Original Message-----
> From: Asmus Freytag [mailto:asmusf@ix.netcom.com]
> Sent: Tuesday, May 09, 2000 5:14 PM
> To: Saba Sundaramurthy; mozilla-i18n@mozilla.org;
> www-international@w3.org; i18n-prog@acoin.com
> Subject: Re: BOM & Unicode editors
> 
> 
> At 04:55 PM 5/9/00 -0700, Saba Sundaramurthy wrote:
> >     Is this something all editors that save files in 
> Unicode or UTF-8 are
> >required to do? Can I depend on the presence of this marker 
> in my code?
> 
> No, it's not a requirement, but it's a convention followed by 
> quite a few 
> tools,
> because otherwise it's harder to use the same .txt extension 
> for both ASCII and
> Unicode (and also it helps to mark the byte order, of course).
> 
> I would recommend that you look for it in your code, if you 
> plan to read UTF-16
> files. At the minimum you need to be prepared for its 
> presence. But you may
> possibly encounter some un-marked UTF-16. There are some quite strong 
> heuristics that one can follow to detect Unicode without a BOM, but a 
> signature like this is more reliable.
> 
> A./
> 

Received on Tuesday, 9 May 2000 21:06:31 UTC