Re: charset issues

At 20:30 05/12/96 +0000, J.Larmouth wrote:
>>Differences are larger for e.g. pure Japanese, it's about a
>>50% overhead. For Indic scripts, the overhead is 200%.
>>But then again, compression will reduce that overhead very
>>nicely.
>
>Again,  a very good (compression) point.  BUT ....  have I missed something? 
>Is compression for HTTP transfers becoming a de facto standard?  (Or even a
>technically agreed approach?)  I think not.

As was explained to me in one of the ISO 10646 meetings, compression should
be considered a low level transmission issue, not a character code or high
level protocol issue. When I ask my communications function to transmit some
information it should be done in the most effiecient manner, and if
compression helps it should be done without my involvement.

Actually, most modern modems compress automatically, and you can see it
happenning by watching the effective transfer rate of large files - my 28.8
modem often shows figures of 4 to 5K bytes/sec for uncompressed data, as
opposed to 1.5 to 2.5 for .zip files.

I remember reading that the compressed (ZIPed) size of Greek text was
measured to be almost the same for the different encoding schemes, which
isn't such a surprise considering Shannon's law.


--

Jonathan Rosenne
JR Consulting
P O Box 33641, Tel Aviv, Israel
Phone: +972 50 246 522 Fax: +972 9 956 7353
http://ourworld.compuserve.com/homepages/Jonathan_Rosenne/

Received on Saturday, 7 December 1996 04:42:05 UTC