Re: Russian charsets

Hi Lena,

I would appreciate if your tests are performed on
www4mail@wm.ictp.trieste.it

The corrections that i have made so far only apply to
www4mail@wm.ictp.trieste.it

BTW, any ideas as to why http://chat.ru/~sercons/menu.html 
timeouts, I can't even reach it with a normal  Web Browser and/or proxy
servers.. from here in Italy or via a proxy in the U.K??

Some explanations:
The GET/SEND command attempts a visualisation of a remote web page on your
behalf.

Assuming you were sitting in front of a terminal (vt100 or something
similar), and you are using lynx to view the remote web page..
lynx would try to present the page in the character of the terminal, not
the character set of the remote document! (Actually www4mail uses lynx,
so here we are at the mercy of whatever version of lynx is installed).

In www4mail, there is no terminal, so we simply tell lynx to display
the visualisation in the character set requested by the remote user.

The modification (only on www4mail@wm.ictp.trieste.it), I have done does
the following:
	Visualisation is done using a character set obtained as follows:

		1. As indicated by the XCHARSET command
		2. As indicated by the client's mail program
		3. As inferred from XLANGUAGE command (us-ascii ignored)
		4. As configured by the local site Administrator
		5. As specified by the remote Web Server 

Unfortunately, it is not possible to specify no conversion because the
entire visualisation process (showing characters on the screen) requires
it). What I can do is to add special handling for XCHARSET=AUTO
In which case, the visualisation/reply will be in the HTML document's
character set and the reply mail message will carry the character set
header of the HTML document.

Please, all tests should only be on www4mail@wm.ictp.trieste.it. As I
don't expect any updates to other www4mail servers before next week..

Thanks
Clement

On Wed, 3 Oct 2001 Lena@lena.kiev.ua wrote:

> Hi Clement,
> 
> > > Some Russian web-servers specify incorrect charset in HTTP header,
> > > some Russian webmasters incorrectly specify charset in
> > > <meta http-equiv="Content-Type" content="text/html; charset=...">
> > >
> > > > www4mail tries to do a dump of the page into the character set
> > > > Windows-1521 and sends the resulting page as an attachment due to the fact
> > > > Windows-1521 is different from the user's character set koi-r
> >
> > > IMO it's counterproductive. Please make www4mail to never make attachments
> > > for GET/SEND and never recode from one charset to another.
> > > Specifying charset in header of plain-text letters from www4mail
> > > (Content-Type: text/plain; charset=...) according to charset specified
> > > in the header of HTTP response  is useful, but optional.
> >
> > For the proper support of multi-lingual Web Pages, it is necessary for
> > www4mail to attempt a transformation for the GET/SEND commands as follows
> 
> What happens if you insist on "proper" and "necessary":
> 
> Lets take for example the URL
> http://chat.ru/~sercons/menu.html
> (it redirects to http://sercons.chat.ru/menu.html).
> Only <www4mail@collaborium.org> can handle chat.ru site,
> all other www4mail servers give "timeout" or like.
> SOURCE command gives the page OK, and I see a mistake of the webmaster:
> <meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
> though really the webserver gives the page to collaborium (3.0/pre3.0rc12b)
> in koi8-r charset irrespective to whether I specify "xcharset koi8-r"
> and "xlanguage ru" or not (my letter to www4mail doesn't contain 8bit
> characters, therefore my mail client specifies us-ascii charset in
> the header of letter). My mail client receives plain text letter from
> www4mail in koi8-r charset OK.
What is this a smart mail client ?? 
Interesting ;-(

> 
> -----
> xnostat
> get http://chat.ru/~sercons/menu.html
> -----
> xcharset koi8-r
> xlanguage ru
> xnostat
> get http://chat.ru/~sercons/menu.html
> -----
> Both these requests return letters with attachments, headers of attachments
> have 'charset="windows-1251"', text is transliterated by Latin letters
> (do you like Greek text transliterated by Latin letters?),
> and transliteration is done incorrectly, text is completely unreadable.
> 
> > Regardless of what the Remote Web Server specifies as the Character Set,
> > www4mail should transform into a form compatible with the user' e-mail
> > client or local configuration!
> 
> Most of russified mail clients can handle letters in any of
> several charsets including two most widespread standard koi8-r
> and Micr0$oft's windows-1251. What these users must specify in XCHARSET?
> 
> If you still insist on "proper" then please make a new command forbidding
> www4mail to do any charset conversion and attachments for GET/SEND.
> For example, XNOCONV or XNOCONVERSION. For me this conversion
> only makes problems.
> 
> Thanks,
> 
> Lena
> 
> P.S. kabissa has DNS problems: it refuses to recieve letters from
> my ISP's SMTP incorrectly claiing that my domain lena.kiev.ua
> doesn't exist - you can check yourself: http://lena.kiev.ua
> 

Received on Thursday, 4 October 2001 05:18:17 UTC