Re: file types

Scott E. Preece (preece@predator.urbana.mcd.mot.com)
Fri, 25 Oct 1996 08:33:46 -0500


Date: Fri, 25 Oct 1996 08:33:46 -0500
Message-Id: <199610251333.IAA22782@predator.urbana.mcd.mot.com>
From: "Scott E. Preece" <preece@predator.urbana.mcd.mot.com>
To: davidp@earthlink.net
CC: www-html@w3.org
In-reply-to: "David Perrell"'s message of Thu, 24 Oct 1996 23:01:28 -0700
Subject: Re: file types

 From: "David Perrell" <davidp@earthlink.net>
| 
| Scott E. Preece wrote:
| > I guess I don't understand your argument - TIFF has an embedded file
| > type code (the same kind of thing I was talking about).
| 
| Either 49492Ah or 4D4D2Ah, depending on byte order of multi-byte
| values. The 2Ah is the never-changing version number. (Are we still at
| Rev 5.0, circa 1988?)
---

Actually,, it appears to be rev 6, circa 1993.

---
| > You only need a new type (or to use a type+version coding) if the
| > types are incompatible between versions.  TIFF isn't (at least at one
| > level), though in terms of vectoring a double-clicked icon to the
| > appropriate application, TIFF is of limited help, since the
| application
| > may need to do substantial parsing before it can decide whether it
| can
| > handle the file or not, which is why I suggested the utility of
| having
| > version numbers.
| 
| That's supposed to be the beauty of TIFF. A well-designed reader could
| make some sense out of just about anything. But how often do you see
| "well-designed"?
---

Yes, a well-designed reader may be able to make sense (in the terms it
understands) of any compliant file.  But that's not the point.  If I
have a file that contains embedded data in a new format, an old tool may
be able to use the file but only to use the old-format data in it.  The
user may also have a tool that knows how to deal with the new-format
data.  With sufficient type information available, the system could
direct the version with the new-format data to a tool that handles it.
Without it, the system either must ask the user or guess.

As I said, the TIFF approach fully covers the case where the user starts
up the tool and then opens the file inside it, or the case where the
user drops the file on the tool's icon (in those cases the user has
selected the tool), but fails to help very much for the double-click
action, which requires that the system be able to determine the right
tool to use based only on the file type.  For that to work, you really
need to have a filetype that includes versioning, so that you can select
a tool appropriate to the particular version opened.  On one of our
machines we have Frame3, Frame4, and Frame5 installed and deal with
files created by all three.  The MIF and binary file types associated
with all three are different - Frame 3 cannot open a Frame 5 binary
file.  For doubleclick opening to work in that environment, you need to
be able to use the version information in the tool selection decision,
either by embedding it in the type itself (assigning a new type code for
each new rev of the type) or separate it into a type part and a
version part. I prefer the latter, since it provides a little more
information structurally, rather than depending on the registry to
provide distinguishable names for different versions, but they are
functionally equivalent.

---
| ...
| 
| All fine and dandy, except that to avoid the overhead of opening and
| reading a file to find out the type, the type must be part of the file
| system, not embedded in the file data. I know of no possible mechanism
| for this in the FAT system besides the extension and a single attribute
| byte. I believe the same is true of NTFS, though here you've got long
| filenames and support for UNICODE characters. Can you imaging having to
| open tens of thousands of files to construct a readable
| folder/directory listing?
---

It is, as a couple of people mentioned, possible to graft capabilities
onto the FAT filesystem by adding separate datastores that the system
(or tools) can use in adddition to the built-in data.  In the UNIX
world, the CDE uses such databases to manage file typing on top of
UNIX's untyped filesystems.  In a new filesystem either approach (in the
meta-data or in the file data) would work equally well - to avoid
needing to open files to check the type, you just cheat by storing the
first n bytes of the file with the meta-data.  It makes the accessing
routines in the OS a little hairier (though that can be mostly avoided
if the data is just replicated with the meta-data and is still stored in
the file's own space as well), but it also radically shortens
access times for files small enough to fit into that little chunk.

scott

--
scott preece
motorola/mcg urbana design center	1101 e. university, urbana, il   61801
phone:	217-384-8589			  fax:	217-384-8550
internet mail:	preece@urbana.mcd.mot.com