Russian charsets (was Re: Injured tex, injured engine)

Hello Clement,

> Apparently the remote Web Server, tells www4mail that the
> character set for the document is Windows-1521.

Many of Russian web-servers give pages in different charsets depending on
HTTP_USER_AGENT. www4mail 2.2 and 3 have different USER_AGENT strings.

Some Russian web-servers specify incorrect charset in HTTP header,
some Russian webmasters incorrectly specify charset in
<meta http-equiv="Content-Type" content="text/html; charset=...">

> www4mail tries to do a dump of the page into the character set
> Windows-1521 and sends the resulting page as an attachment due to the fact
> Windows-1521 is different from the user's character set koi-r

IMO it's counterproductive. Please make www4mail to never make attachments
for GET/SEND and never recode from one charset to another.
Specifying charset in header of plain-text letters from www4mail
(Content-Type: text/plain; charset=...) according to charset specified
in the header of HTTP response  is useful, but optional.

If webserver or webmaster specified incorrect charset then needed
recoding is better done at receiving of letter by mail client,
it guarantees that max one recoding can be needed. If www4mail
tries to guess needed recoding and guesses wrong (easily because
of webserver's or webmaster's mistakes) then recipient may need
to perform several consecutive recodings. Mail clients can't do that,
special standalone program is needed able to try all combinations
and guess which of them give text more like Russian (non-trivial task).

-- 
μΕΞΑ

P.S. Thanks for handling of [ ] in multi-line textarea, it works.

Received on Tuesday, 2 October 2001 11:59:10 UTC