(unknown charset) Re: Russian charsets (was Re: Injured tex, injured engine)

Hi Lena,

On Tue, 2 Oct 2001 Lena@lena.kiev.ua wrote:

> Hello Clement,
> 
> > Apparently the remote Web Server, tells www4mail that the
> > character set for the document is Windows-1521.
> 
> Many of Russian web-servers give pages in different charsets depending on
> HTTP_USER_AGENT. www4mail 2.2 and 3 have different USER_AGENT strings.

Well, this is allowed in the HTTP protocol..

> 
> Some Russian web-servers specify incorrect charset in HTTP header,
> some Russian webmasters incorrectly specify charset in
> <meta http-equiv="Content-Type" content="text/html; charset=...">
> 
> > www4mail tries to do a dump of the page into the character set
> > Windows-1521 and sends the resulting page as an attachment due to the fact
> > Windows-1521 is different from the user's character set koi-r
> 

> IMO it's counterproductive. Please make www4mail to never make attachments
> for GET/SEND and never recode from one charset to another.
> Specifying charset in header of plain-text letters from www4mail
> (Content-Type: text/plain; charset=...) according to charset specified
> in the header of HTTP response  is useful, but optional.

For the proper support of multi-lingual Web Pages, it is necessary for
www4mail to attempt a transformation for the GET/SEND commands as follows

Regardless of what the Remote Web Server specifies as the Character Set,
www4mail should transform into a form compatible with the user' e-mail
client or local configuration! This procedure, will ensure that the
GET/SEND requests are not attached.

The following order is used

1. XCHARSET command (user specified a Character set to use).
2. E-mail Message Header (Most e-mail clients send a charset header by
default)
3. XLANGUAGE command (user specified a language, some languages have 
	a character set associated with them.).
	Here the us-ascii is ignored, and www4mail currently only has
	language definitions for Russian, Spanish, German and French

4. The Local Administrator configured Character Set.

5. The Remote Web Server's idea of Charset..

> 
> If webserver or webmaster specified incorrect charset then needed
> recoding is better done at receiving of letter by mail client,
> it guarantees that max one recoding can be needed. If www4mail
> tries to guess needed recoding and guesses wrong (easily because
> of webserver's or webmaster's mistakes) then recipient may need
> to perform several consecutive recodings. Mail clients can't do that,
> special standalone program is needed able to try all combinations
> and guess which of them give text more like Russian (non-trivial task).
> 
Well, as presented in the list above:
the user has complete control of which character set is used finally  
using the XCHARSET command.

With the above procedure, Most GET/SEND requests will be sent in the body
of the mail message.
The www4mail server will add extra information to indicate the original
Character Set of the Document.

Thanks
Clement

> -- 
> μΕΞΑ
> 
> P.S. Thanks for handling of [ ] in multi-line textarea, it works.
> 

Received on Wednesday, 3 October 2001 03:10:13 UTC