- From: Saba Sundaramurthy <ssundaramurthy@verisign.com>
- Date: Tue, 9 May 2000 18:05:54 -0700
- To: "'Asmus Freytag'" <asmusf@ix.netcom.com>, www-international@w3.org
Hi, Thanks for your response. Can you point me to more information on heuristics I can use to detect a UTF-16 or UTF-8 file. The Microsoft editors I used saved the file as actual Unicode values (2 byte values). Although I am not familiar with UTF16 encoding, I assume it results in a different sequence than actual 2 byte Unicode values. So could you also help me identify pure unicode data too? Where can I find more info. on byte order detection in the absence of the BOM. -Saba > -----Original Message----- > From: Asmus Freytag [mailto:asmusf@ix.netcom.com] > Sent: Tuesday, May 09, 2000 5:14 PM > To: Saba Sundaramurthy; mozilla-i18n@mozilla.org; > www-international@w3.org; i18n-prog@acoin.com > Subject: Re: BOM & Unicode editors > > > At 04:55 PM 5/9/00 -0700, Saba Sundaramurthy wrote: > > Is this something all editors that save files in > Unicode or UTF-8 are > >required to do? Can I depend on the presence of this marker > in my code? > > No, it's not a requirement, but it's a convention followed by > quite a few > tools, > because otherwise it's harder to use the same .txt extension > for both ASCII and > Unicode (and also it helps to mark the byte order, of course). > > I would recommend that you look for it in your code, if you > plan to read UTF-16 > files. At the minimum you need to be prepared for its > presence. But you may > possibly encounter some un-marked UTF-16. There are some quite strong > heuristics that one can follow to detect Unicode without a BOM, but a > signature like this is more reliable. > > A./ >
Received on Tuesday, 9 May 2000 21:06:31 UTC