- From: Stephen D. Williams <sdw@lig.net>
- Date: Fri, 18 Feb 2005 12:29:12 -0500
- To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevrontexaco.com>
- Cc: "Fred P." <fprog26@hotmail.com>, public-xml-binary@w3.org
- Message-ID: <421625E8.4060700@lig.net>
As an aside, you should have been considering Linux+ReiserFS rather than Solaris+UFS. ReiserFS has been the best filesystem for a number of purposes since about 1999 or 2000, especially including handling very many files, especially very many small files. Not only can you put 100,000 files in a directory with no problems, but the overhead of small files is about under 64 bytes on average. (I believe the overhead was 17 bytes plus the length of the filename.) The filesystem, in its default mode, combines 'tails' just like a database would. In fact, it's use of btrees and hashes along with journaling pretty make it a database/filesystem. In 2000 I benchmarked a 400Mhz system with a single 10,000 RPM drive which was able to create/write, read, or delete, small (64, 128, 256, 1024, 2048, etc.) files at about 1100 per second. For this test, I was operating on 1 million files in 10 directories of 100,000 each. Hans Reiser, Stephen Tweedie (Ext2/ext3 author), and I debated the need for better mulithreaded models for ReiserFS at one of the first LinuxWorlds. It will be interesting to see how Soliaris's new filesystem compares. Still (back on the subject), in general it's bad to create that many files unless you have a good reason. It can't be required in the processing of a generalized data format. sdw Cutler, Roger (RogerCutler) wrote: >... >About your specific proposal for handling the seismic data (which is our >contribution -- including an example dataset), compression aside, I >still don't know. Is it really reasonable to fling millions of small >files around? I recall that some operating systems don't like that at >all. As a specific example, I have experience on Solaris Unix systems >making directories containing hundreds of thousands of small >auto-generated files. The OS choked -- really fundamentally choked -- >if you tried to put them all in one directory. I was forced to make >directory trees with leaf directories that had some max number of files >in them (I used 1000, if I recall correctly). This necessitated, of >course, a bunch of pain-in-the-neck logic and code. > >This was a while ago, so maybe things have improved -- I throw the >experience out for what it is worth. But I am dubious and would >certainly want to see demonstrations before committing to this approach. > > > ... -- swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
Received on Friday, 18 February 2005 17:27:01 UTC