Also, I have a UTF-8 valuator. You can upload a file to see it is UTF-8 or not. See http://people.netscape.com/ftang/i18n.html "Martin J. Duerst" wrote: > Hello Saba, > > For some more information on UTF-8, please see > http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf. > > There are some errors in the slide on page 5, but > they are not very relevant here. > > The paper in particular shows how easy it is to automatically > detect UTF-8 based on its specific byte patterns. This can > mostly be done on the fly, i.e. a decoder starts with the > assumption that it reads only ASCII and decides whether it's > the local legacy encoding or UTF-8 once the first bytes > with the 8th bit set are seen. > > One big problem of using the BOM as a 'magic number' for UTF-8 > also shouldn't go unmentionned here: > > UTF-8 without a BOM has the very important property that it > encodes ASCII as ASCII, and everything else as something else. > An ASCII file therefore is automatically UTF-8. All the nice > things that you can do with text files can be done with UTF-8, > too. However, once there is a BOM on a file, an ASCII file is > no longer ASCII, and very simple operations such as an Unix > 'cat' fail. > > Regards, Martin. > > At 00/05/09 16:55 -0700, Saba Sundaramurthy wrote: > >Hi, > > > >1) Playing with text editors (FrontPage 2000 and Notepad) in Windows NT > >and Windows 2000, I noticed that when ever the contents are saved unicode or > >UTF-8 there is a marker FEFF placed at the beginning of the file. Inspecting > >this marker can give information about the byte ordering of the machine and > >also if the following bytes are Unicode or UTF-8. > > > > Is this something all editors that save files in Unicode or UTF-8 are > >required to do? Can I depend on the presence of this marker in my code? > > > >2) Are there any editors available on unix to allow you to save text in > >Unicode or UTF-8? > > > >Thanks in advance, > >-SabaReceived on Saturday, 13 May 2000 16:40:19 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT