- From: Yung-Fong Tang <ftang@netscape.com>
- Date: Sat, 13 May 2000 13:40:08 -0700
- To: "Martin J. Duerst" <duerst@w3.org>
- CC: Saba Sundaramurthy <ssundaramurthy@verisign.com>, mozilla-i18n@mozilla.org, www-international@w3.org, i18n-prog@acoin.com
Also, I have a UTF-8 valuator. You can upload a file to see it is UTF-8 or not. See http://people.netscape.com/ftang/i18n.html "Martin J. Duerst" wrote: > Hello Saba, > > For some more information on UTF-8, please see > http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf. > > There are some errors in the slide on page 5, but > they are not very relevant here. > > The paper in particular shows how easy it is to automatically > detect UTF-8 based on its specific byte patterns. This can > mostly be done on the fly, i.e. a decoder starts with the > assumption that it reads only ASCII and decides whether it's > the local legacy encoding or UTF-8 once the first bytes > with the 8th bit set are seen. > > One big problem of using the BOM as a 'magic number' for UTF-8 > also shouldn't go unmentionned here: > > UTF-8 without a BOM has the very important property that it > encodes ASCII as ASCII, and everything else as something else. > An ASCII file therefore is automatically UTF-8. All the nice > things that you can do with text files can be done with UTF-8, > too. However, once there is a BOM on a file, an ASCII file is > no longer ASCII, and very simple operations such as an Unix > 'cat' fail. > > Regards, Martin. > > At 00/05/09 16:55 -0700, Saba Sundaramurthy wrote: > >Hi, > > > >1) Playing with text editors (FrontPage 2000 and Notepad) in Windows NT > >and Windows 2000, I noticed that when ever the contents are saved unicode or > >UTF-8 there is a marker FEFF placed at the beginning of the file. Inspecting > >this marker can give information about the byte ordering of the machine and > >also if the following bytes are Unicode or UTF-8. > > > > Is this something all editors that save files in Unicode or UTF-8 are > >required to do? Can I depend on the presence of this marker in my code? > > > >2) Are there any editors available on unix to allow you to save text in > >Unicode or UTF-8? > > > >Thanks in advance, > >-Saba
Received on Saturday, 13 May 2000 16:40:19 UTC