Henri Sivonen wrote: > I would be interesting to see a large-scale study of the > compactness of UTF-8 vs. UTF-16 vs. BOCU-1 vs. SCSU vs. > well-supported applicable legacy encodings vs. and all > of them gzipped as applied to real-world *Web* content. Yes, same here. Apart from what is covered in UTN #14 here's my own test result for permutations of MES-1 + BOM: UTF-32 0000FEFF 1344 UTF-16 FEFF 672 UTF-8 EFBBBF 595 UTF-7 2B2F76382D 836 UTF-4 849F9E9F9F 789 UTF-1 F7644C 578 BOCU-1 FBEE28 514 B( 80) 627 B( 1) 377 For BOCU-1 I tried to catch worst (627) and best (377) cases, but it is quite possible that I missed worse / better cases. Of course "permutation of MES-1 + BOM" is totally unrelated to "real world Web content". For the script of this test see <http://purl.net/xyzzy/src/bocu.cmd> - but its Bocu-1 code is unsuited for real applications (no error handling). FrankReceived on Tuesday, 29 January 2008 15:21:45 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 18:29:55 GMT