- From: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Sun, 12 Jun 2005 08:21:50 -0400
- To: www-tag@w3.org
Another important feature of UTF-8 vs. UTF-16, irrespective of size issues. In UTF-8 you always know where you are. That is, given a single byte you can immediately determine if it is a single byte character, the first byte of a two-byte character, the second byte of a two-byte character, or the second or third or fourth byte of a three-or-four byte character. (That's not quite all the possibilities but you get the idea.) In UTF-16, you don't always know that the byte 0x41 is indeed the letter A. Sometimes it is and sometimes it isn't. You have to keep track of enough state to know where you are in the stream. If a single byte gets lost, all data from that point forward is corrupted, at least until another byte is lost. -- Elliotte Rusty Harold elharo@metalab.unc.edu XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
Received on Sunday, 12 June 2005 12:21:56 UTC