- From: Fred P. <fprog26@hotmail.com>
- Date: Sun, 20 Feb 2005 06:04:06 -0500
- To: RogerCutler@chevrontexaco.com, public-xml-binary@w3.org
Hi M. Cutler,
I did some experiment with two files:
hi.html:
<html><body>HI</body></html>
hi.pl:
#!/usr/bin/perl
print "hi\n";
exit;
pkzip -a -e0 hi.zip hi.html
pkzip -a -e0 hi.zip hi.pl
copy hi.zip hi2.zip
pkzip -a -e0 hi2.zip hi.zip
The last one is to check if there is any "translation/encoding" issues.
It gives the following zip output in binary:
PK\x3\x4\xA
\0\0\0\0\0D)T2\x10\x91\0\xDD\x1E
\0\0\0\x1E\0\0\0\x7\0\0\0
hi.html
<html><body>HI</body></html>\xD\xA
PK\x3\x4\xA
\0\0\0\0\0[)T2\xAE\xD8S0'
\0\0\0'\0\0\0\x5\0\0\0
hi.pl
#!/usr/bin/perl\xD\xA
print "hi\n";\xD\xA
exit;\xD\xA
PK\x3\x4\xA
\0\0€\0\0i)T2{2\xE5F\xB\x1\0\0
\xB\x1\0\0\x6\0\0\0
hi.zip
PK\x3\x4\xA
\0\0\0\0\0D)T2\x10\x91\0\xDD
\x1E\0\0\0\x1E\0\0\0\x7\0\0\0
hi.html
<html><body>HI</body></html>\xD\xA
PK\x3\x4\xA
\0\0\0\0\0[)T2\xAE\xD8S0'\0\0\0'\0\0\0\x5\0\0\0
hi.pl
#!/usr/bin/perl\xD\xA
print "hi\n";\xD\xA
exit;\xD\xA
PK\x1\x2\x19\0\xA
\0\0\0\0
\0D)T2\x10\x91\0\xDD\x1E\0\0\0\x1E\0\0\0\x7
\0\0\0\0\0\0\0\x1\0 \0\0\0\0\0\0\0
hi.html
PK\x1\x2\x19\0\xA
\0\0\0\0\0[)T2\xAE\xD8S0'\0\0\0'\0\0\0\x5
\0\0\0\0\0\0\0\x1\0 \0\0\0C\0\0\0
hi.pl
PK\x5\x6
\0\0\0\0
\x2\0\x2\0h\0\0\0\x8D\0\0\0\0\0
PK\x1\x2\x19\0\xA
\0\0\0\0
\0D)T2\x10\x91\0\xDD
\x1E\0\0\0
\x1E\0\0\0\x7
\0\0\0\0\0\0\0\x1\0 \0\0\0\0\0\0\0
hi.html
PK\x1\x2\x19\0\xA
\0\0\0\0
\0[)T2\xAE\xD8S0'\0\0\0'\0\0\0\x5
\0\0\0\0\0\0\0\x1\0 \0\0\0C\0\0\0
hi.pl
PK\x1\x2\x19\0\xA
\0\0\0\0
\0i)T2{2\xE5F\xB\x1\0\0\xB\x1\0\0\x6
\0\0\0\0\0\0\0\x1\0 \0\0\0\x8D\0\0\0
hi.zip
PK\x5\x6
\0\0\0\0
\x3\0\x3\0\x9C\0\0\0\xBC\x1
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
As you can see, the entire text is not modified or encoded in any given way,
so you could fseek() fread() your data directly within the file.
You might also fwrite() it directly without changing the size of the file,
but the CRC and similar won't match so it needs to be recomputed.
The header "PK\x3\x4\xA" for each entry is not even translated
when you add a zip inside a zip.
So, you could barely search using this string for any zip entry using
memchr(buf, 'P', len), !memcmp( buf, "PK\x3\x4\xA", len),
altough it's not safe, since an equivalent binary string is not encoded at
all.
Notice also that the filename/path is not encoded,
so it could be loaded via memcpy and searched for.
The conventional way is to use the 22 bytes (LOCLEN) uncompressed length to
fseek()
into the file stream and check for header info.
Since files are appended, you need to visit every entry header
using a O(n) algorithm to find your desired file.
However, you may cache this index information for future retrieval in
memory,
since those header are quite small (32 bytes each + filename/path).
So even for 1000 files, you get something under 64KB
Once you found your desired /seisdata/trace[1].bin
you can directly fread() it into a float array and use it in no time.
As I said before, there's no encoding/translation/compression for -e0,
so the data is packed as is.
The unzip algo can be found here, less than 200 lines of code:
http://www.koders.com/c/fidC5CE35109E7F4A32464FB8B809E311E324085A6F.aspx
funzip.c file content can be found here:
http://computing.ee.ethz.ch/sepp/unzip-551-rs.SEPP/src/unzip-5.51/funzip.c
Sincerely yours,
Fred.
Received on Sunday, 20 February 2005 11:05:33 UTC