W3C home > Mailing lists > Public > public-xml-binary@w3.org > February 2005

RE: [XML-Binary] ZIP file format using XPATH for directory entries proposal

From: Cutler, Roger (RogerCutler) <RogerCutler@chevrontexaco.com>
Date: Fri, 18 Feb 2005 13:47:40 -0600
Message-ID: <71C38086EA230D43941DD0A3BAFF8CA929C4A2@bocnte2k3.hou150.chevrontexaco.net>
To: "Stephen D. Williams" <sdw@lig.net>
cc: "Fred P." <fprog26@hotmail.com>, public-xml-binary@w3.org

Thanks for the info.  It sounds, bottom line, that you agree with me
that making each trace a separate file sounds pretty dubious.

As for your specific suggestion, that was not really an option for other
reasons.  I was using the OS I pretty much had to use.  However, I will
forward your comments to our lead Unix systems guy.

-----Original Message-----
From: public-xml-binary-request@w3.org
[mailto:public-xml-binary-request@w3.org] On Behalf Of Stephen D.
Williams
Sent: Friday, February 18, 2005 11:29 AM
To: Cutler, Roger (RogerCutler)
Cc: Fred P.; public-xml-binary@w3.org
Subject: Re: [XML-Binary] ZIP file format using XPATH for directory
entries proposal


As an aside, you should have been considering Linux+ReiserFS rather than

Solaris+UFS.

ReiserFS has been the best filesystem for a number of purposes since 
about 1999 or 2000, especially including handling very many files, 
especially very many small files.  Not only can you put 100,000 files in

a directory with no problems, but the overhead of small files is about 
under 64 bytes on average.  (I believe the overhead was 17 bytes plus 
the length of the filename.)  The filesystem, in its default mode, 
combines 'tails' just like a database would.  In fact, it's use of 
btrees and hashes along with journaling pretty make it a 
database/filesystem.

In 2000 I benchmarked a 400Mhz system with a single 10,000 RPM drive 
which was able to create/write, read, or delete, small (64, 128, 256, 
1024, 2048, etc.) files at about 1100 per second.  For this test, I was 
operating on 1 million files in 10 directories of 100,000 each.

Hans Reiser, Stephen Tweedie (Ext2/ext3 author), and I debated the need 
for better mulithreaded models for ReiserFS at one of the first 
LinuxWorlds.  It will be interesting to see how Soliaris's new 
filesystem compares.

Still (back on the subject), in general it's bad to create that many 
files unless you have a good reason.  It can't be required in the 
processing of a generalized data format.

sdw

Cutler, Roger (RogerCutler) wrote:

>...
>About your specific proposal for handling the seismic data (which is 
>our contribution -- including an example dataset), compression aside, I

>still don't know.  Is it really reasonable to fling millions of small 
>files around?  I recall that some operating systems don't like that at 
>all.  As a specific example, I have experience on Solaris Unix systems 
>making directories containing hundreds of thousands of small 
>auto-generated files.  The OS choked -- really fundamentally choked -- 
>if you tried to put them all in one directory.  I was forced to make 
>directory trees with leaf directories that had some max number of files

>in them (I used 1000, if I recall correctly).  This necessitated, of 
>course, a bunch of pain-in-the-neck logic and code.
>
>This was a while ago, so maybe things have improved -- I throw the 
>experience out for what it is worth.  But I am dubious and would 
>certainly want to see demonstrations before committing to this 
>approach.
>
>  
>
...

-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
Received on Friday, 18 February 2005 19:48:24 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Thursday, 1 December 2005 00:07:42 GMT