W3C home > Mailing lists > Public > public-xml-binary@w3.org > February 2005

Re: [XML-Binary] ZIP file format using XPATH for directory entries proposal

From: Stephen D. Williams <sdw@lig.net>
Date: Fri, 18 Feb 2005 12:29:12 -0500
Message-ID: <421625E8.4060700@lig.net>
To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevrontexaco.com>
Cc: "Fred P." <fprog26@hotmail.com>, public-xml-binary@w3.org
As an aside, you should have been considering Linux+ReiserFS rather than 
Solaris+UFS.

ReiserFS has been the best filesystem for a number of purposes since 
about 1999 or 2000, especially including handling very many files, 
especially very many small files.  Not only can you put 100,000 files in 
a directory with no problems, but the overhead of small files is about 
under 64 bytes on average.  (I believe the overhead was 17 bytes plus 
the length of the filename.)  The filesystem, in its default mode, 
combines 'tails' just like a database would.  In fact, it's use of 
btrees and hashes along with journaling pretty make it a 
database/filesystem.

In 2000 I benchmarked a 400Mhz system with a single 10,000 RPM drive 
which was able to create/write, read, or delete, small (64, 128, 256, 
1024, 2048, etc.) files at about 1100 per second.  For this test, I was 
operating on 1 million files in 10 directories of 100,000 each.

Hans Reiser, Stephen Tweedie (Ext2/ext3 author), and I debated the need 
for better mulithreaded models for ReiserFS at one of the first 
LinuxWorlds.  It will be interesting to see how Soliaris's new 
filesystem compares.

Still (back on the subject), in general it's bad to create that many 
files unless you have a good reason.  It can't be required in the 
processing of a generalized data format.

sdw

Cutler, Roger (RogerCutler) wrote:

>...
>About your specific proposal for handling the seismic data (which is our
>contribution -- including an example dataset), compression aside, I
>still don't know.  Is it really reasonable to fling millions of small
>files around?  I recall that some operating systems don't like that at
>all.  As a specific example, I have experience on Solaris Unix systems
>making directories containing hundreds of thousands of small
>auto-generated files.  The OS choked -- really fundamentally choked --
>if you tried to put them all in one directory.  I was forced to make
>directory trees with leaf directories that had some max number of files
>in them (I used 1000, if I recall correctly).  This necessitated, of
>course, a bunch of pain-in-the-neck logic and code.
>
>This was a while ago, so maybe things have improved -- I throw the
>experience out for what it is worth.  But I am dubious and would
>certainly want to see demonstrations before committing to this approach.
>
>  
>
...

-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw


Received on Friday, 18 February 2005 17:27:01 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Thursday, 1 December 2005 00:07:42 GMT