- From: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Date: Tue, 29 Jan 2008 16:21:46 +0100
- To: <public-html-comments@w3.org>
- Cc: "Henri Sivonen" <hsivonen@iki.fi>
Henri Sivonen wrote: > I would be interesting to see a large-scale study of the > compactness of UTF-8 vs. UTF-16 vs. BOCU-1 vs. SCSU vs. > well-supported applicable legacy encodings vs. and all > of them gzipped as applied to real-world *Web* content. Yes, same here. Apart from what is covered in UTN #14 here's my own test result for permutations of MES-1 + BOM: UTF-32 0000FEFF 1344 UTF-16 FEFF 672 UTF-8 EFBBBF 595 UTF-7 2B2F76382D 836 UTF-4 849F9E9F9F 789 UTF-1 F7644C 578 BOCU-1 FBEE28 514 B( 80) 627 B( 1) 377 For BOCU-1 I tried to catch worst (627) and best (377) cases, but it is quite possible that I missed worse / better cases. Of course "permutation of MES-1 + BOM" is totally unrelated to "real world Web content". For the script of this test see <http://purl.net/xyzzy/src/bocu.cmd> - but its Bocu-1 code is unsuited for real applications (no error handling). Frank
Received on Tuesday, 29 January 2008 15:21:45 UTC