Re: auto-detecting the character encoding of an uploaded file

Martin Duerst scripsit:

> On tough end, it's actually impossible to distinguish between
> iso-8859-1 and iso-8859-2 for German texts, because the bytes for
> the characters used are exactly the same. But maybe in this case,
> it doesn't matter too much.

It is a curious fact, not mentioned by anybody but me AFAIK, that
over the joint repertoire of iso-8859-[1-4], every character
is encoded in each charset either with the same octet or else not at all.

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
Please leave your values        |       Check your assumptions.  In fact,
   at the front desk.           |          check your assumptions at the door.
     --sign in Paris hotel      |            --Miles Vorkosigan

Received on Wednesday, 5 September 2001 10:39:20 UTC