- From: Fred P. <fprog26@hotmail.com>
- Date: Sun, 20 Feb 2005 06:04:06 -0500
- To: RogerCutler@chevrontexaco.com, public-xml-binary@w3.org
Hi M. Cutler, I did some experiment with two files: hi.html: <html><body>HI</body></html> hi.pl: #!/usr/bin/perl print "hi\n"; exit; pkzip -a -e0 hi.zip hi.html pkzip -a -e0 hi.zip hi.pl copy hi.zip hi2.zip pkzip -a -e0 hi2.zip hi.zip The last one is to check if there is any "translation/encoding" issues. It gives the following zip output in binary: PK\x3\x4\xA \0\0\0\0\0D)T2\x10\x91\0\xDD\x1E \0\0\0\x1E\0\0\0\x7\0\0\0 hi.html <html><body>HI</body></html>\xD\xA PK\x3\x4\xA \0\0\0\0\0[)T2\xAE\xD8S0' \0\0\0'\0\0\0\x5\0\0\0 hi.pl #!/usr/bin/perl\xD\xA print "hi\n";\xD\xA exit;\xD\xA PK\x3\x4\xA \0\0€\0\0i)T2{2\xE5F\xB\x1\0\0 \xB\x1\0\0\x6\0\0\0 hi.zip PK\x3\x4\xA \0\0\0\0\0D)T2\x10\x91\0\xDD \x1E\0\0\0\x1E\0\0\0\x7\0\0\0 hi.html <html><body>HI</body></html>\xD\xA PK\x3\x4\xA \0\0\0\0\0[)T2\xAE\xD8S0'\0\0\0'\0\0\0\x5\0\0\0 hi.pl #!/usr/bin/perl\xD\xA print "hi\n";\xD\xA exit;\xD\xA PK\x1\x2\x19\0\xA \0\0\0\0 \0D)T2\x10\x91\0\xDD\x1E\0\0\0\x1E\0\0\0\x7 \0\0\0\0\0\0\0\x1\0 \0\0\0\0\0\0\0 hi.html PK\x1\x2\x19\0\xA \0\0\0\0\0[)T2\xAE\xD8S0'\0\0\0'\0\0\0\x5 \0\0\0\0\0\0\0\x1\0 \0\0\0C\0\0\0 hi.pl PK\x5\x6 \0\0\0\0 \x2\0\x2\0h\0\0\0\x8D\0\0\0\0\0 PK\x1\x2\x19\0\xA \0\0\0\0 \0D)T2\x10\x91\0\xDD \x1E\0\0\0 \x1E\0\0\0\x7 \0\0\0\0\0\0\0\x1\0 \0\0\0\0\0\0\0 hi.html PK\x1\x2\x19\0\xA \0\0\0\0 \0[)T2\xAE\xD8S0'\0\0\0'\0\0\0\x5 \0\0\0\0\0\0\0\x1\0 \0\0\0C\0\0\0 hi.pl PK\x1\x2\x19\0\xA \0\0\0\0 \0i)T2{2\xE5F\xB\x1\0\0\xB\x1\0\0\x6 \0\0\0\0\0\0\0\x1\0 \0\0\0\x8D\0\0\0 hi.zip PK\x5\x6 \0\0\0\0 \x3\0\x3\0\x9C\0\0\0\xBC\x1 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 As you can see, the entire text is not modified or encoded in any given way, so you could fseek() fread() your data directly within the file. You might also fwrite() it directly without changing the size of the file, but the CRC and similar won't match so it needs to be recomputed. The header "PK\x3\x4\xA" for each entry is not even translated when you add a zip inside a zip. So, you could barely search using this string for any zip entry using memchr(buf, 'P', len), !memcmp( buf, "PK\x3\x4\xA", len), altough it's not safe, since an equivalent binary string is not encoded at all. Notice also that the filename/path is not encoded, so it could be loaded via memcpy and searched for. The conventional way is to use the 22 bytes (LOCLEN) uncompressed length to fseek() into the file stream and check for header info. Since files are appended, you need to visit every entry header using a O(n) algorithm to find your desired file. However, you may cache this index information for future retrieval in memory, since those header are quite small (32 bytes each + filename/path). So even for 1000 files, you get something under 64KB Once you found your desired /seisdata/trace[1].bin you can directly fread() it into a float array and use it in no time. As I said before, there's no encoding/translation/compression for -e0, so the data is packed as is. The unzip algo can be found here, less than 200 lines of code: http://www.koders.com/c/fidC5CE35109E7F4A32464FB8B809E311E324085A6F.aspx funzip.c file content can be found here: http://computing.ee.ethz.ch/sepp/unzip-551-rs.SEPP/src/unzip-5.51/funzip.c Sincerely yours, Fred.
Received on Sunday, 20 February 2005 11:05:33 UTC