Hello Vinod, At 00/06/16 14:21 -0700, Vinod Balakrishnan wrote: >Hi, > >How can we distinguish the UTF-8 characters sequence from a >Latin-1/Latin-? characters. First, I think you are speaking about a byte sequence, not a character sequence. It is quite easy to have a look at a byte sequence and heuristically decide whether it is UTF-8 or not. Please for example have a look at http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf >In case of most of the internet application >UTF16 characters are prefixed by "0xu" and for the UTF8 characters there >is no prefix to identify those. Do we HAVE/NEED a standard to represent >UTF8 ? > >For example, if the browser send out a http GET request for a non-Roman >characters with out the header information, the server application will >not be able to identify the characters whether they are UTF8 or Latin-1. Do you mean non-ASCII characters in the URIs (or parts of URIs) in the GET line itself? This is indeed a gray area, but the general tendency is to move towards UTF-8 only. In cases where both UTF-8 and a single 'legacy encoding' are used, the above heuristics may help. Regards, Martin.Received on Tuesday, 20 June 2000 02:26:20 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT