Re: [Json] Encoding detection

Henry S. Thompson scripsit:

> (There are, it has to be said, few Unicode characters whose UTF-16-L
> form is 00xx, i.e. U+xx00, the first code point on a code page --
> I had to hunt pretty hard to find the above specimen, which is in
> fact a slight cheat :-) Many code pages have a gap at the 00 point.

There are 68 of them on the Basic Multilingual Plane.  But many
characters in other planes involve such 16-bit code units.  For example,
all of U+10000 to U+103FF are encoded as D800 DC00 through D800 DFFF.
Currently there are 622 characters in this range alone, and the number
will probably grow.

> Not sure about the status of U+4E00, one variant of the ideograph for
> the numeral 1).

Google reports over 3 gigahits for this character.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
To say that Bilbo's breath was taken away is no description at all.  There are
no words left to express his staggerment, since Men changed the language that
they learned of elves in the days when all the world was wonderful. --The Hobbit

Received on Thursday, 14 November 2013 18:30:20 UTC