W3C home > Mailing lists > Public > www-international@w3.org > July to September 2001

Re: auto-detecting the character encoding of an uploaded file

From: John Cowan <cowan@mercury.ccil.org>
Date: Wed, 5 Sep 2001 10:39:01 -0400 (EDT)
To: Martin Duerst <duerst@w3.org>
CC: Lenny Turetsky <LTuretsky@salesforce.com>, "W3intl (E-mail)" <www-international@w3.org>
Message-Id: <E15edpV-0004rU-00@mercury.ccil.org>
Martin Duerst scripsit:

> On tough end, it's actually impossible to distinguish between
> iso-8859-1 and iso-8859-2 for German texts, because the bytes for
> the characters used are exactly the same. But maybe in this case,
> it doesn't matter too much.

It is a curious fact, not mentioned by anybody but me AFAIK, that
over the joint repertoire of iso-8859-[1-4], every character
is encoded in each charset either with the same octet or else not at all.

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
Please leave your values        |       Check your assumptions.  In fact,
   at the front desk.           |          check your assumptions at the door.
     --sign in Paris hotel      |            --Miles Vorkosigan
Received on Wednesday, 5 September 2001 10:39:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:57 GMT