To: NED@sigurd.innosoft.com, nsb@thumper.bellcore.com, In-Reply-To: Ned Freed's message of Thu, 29 Oct 1992 08:53:10 -0800 <01GQIM2YWA8I91VWYH@SIGURD.INNOSOFT.COM> Subject: Re: misconceptions about MIME [long] From: Larry Masinter <masinter@parc.xerox.com> Message-Id: <92Oct30.155508pst.101795@poplar.parc.xerox.com> Date: Fri, 30 Oct 1992 15:54:56 PST >> The arguments that in-band designation of document format is better >> than out-of-band information may apply in the electronic mail >> scenarios, where there is a single sender, multiple recipients, and >> the recipient has no control over what the sender might send. >The argument is identical for most file servers, which have even less control >over the specifics of what files they offer for retrieval. File servers usually >rely on contributed material and only rarely have anything resembling precise >control over the material they offer. But we are not discussing 'file servers' in general, but something more specific and presumably over which we have more control: use of MIME content identifiers to identify content-type in World-Wide-Web and WAIS servers. Even in the case of file servers, while you might not have control over the material offered, you do have control over the description of that material as to which version of a purported standard format the material might be in, and even, in some cases, which profile of that standard might apply. >> If I wish to retrieve the document, say to view it, I might want to >> choose the available representation that is most appropriate for my >> purpose. Imagine my dismay to retrieve a 50 megabyte postscript file >> from an anonymous FTP archive, only to discover that it is in the >> newly announced Postscript level 4 format, or to try to edit it only >> to discover that it is in the (upwardly compatible but not parsable by >> my client) version 44 of Rich Text. In each case, the appropriateness >> of alternate sources and representations of a document would depend on >> information that is currently only available in-band. >Even if this happens (I have strong doubts that it will since documents made >available for public retrieval tend to converge rapidly to lowest-common >denominator usage) you have failed to propose an alternative that solves this >usefully. Documents made available for public retrieval do not cannot 'tend to converge rapidly to lowest-common denominator usage', because *old documents do not go away*! If there is diversity today in the available formats for RFCs, tech reports and PhD theses, that diversity can only get worse! It is foolish to think that the diversity will diminish any time in the near future; certainly the number of 'conference proceedings on CD-rom' is increasing, as people want to share Mathematica documents, various forms of hypertext, audio content and the like. As for a proposal that 'solves this usefully', I have a fairly mild proposal that, while it does not solve all of the problems in interoperability, does reduce the amount of uncertainty: I propose (once again) that instead of saying 'application/postscript' it say, at a minimum, 'application/postscript 1985' vs 'application/postscript 1994' or whatever you would like to designate as a way to uniquely identify which edition of the Postscript reference manual you are talking about; instead of being identified as 'image/tiff' the files be identified as 'image/tiff 5.0 Class F' vs 'image/tiff 7.0 class QXB'. > Finally, let me point out that I speak as one of the maintainers of one of the > largest archive of TeX material available anywhere. This material has been > available via MIME-compliant mail server (and of course FTP) for over six > months now. This archive contains hundreds of PostScript documents as well > as all sorts of other stuff. The problems you seem to think are endemic to > this sort of services have yet to materialize. I think you need to take a longer-term and broader perspective than a six-month experience with a single representation of document. We've been developing a document archive service that can cope with 20 years of collected electronic documents. We have not only Postscript 1 and 2, but also several versions of Interpress, and Press format, two versions of DVI, revisable formats of 20 years of editor development -- several versions of tex, latex, framemaker, microsoft word, tioga, globalview, viewpoint, bravo, bravox, tedit, troff, interleaf, wordperfect, etc, and images in multiple variations of RES, AIS, TIFF, sun raster, pcx, macpaint, ad nauseum. In trying to deal with a documents over the longer term, it has become apparent that merely marking documents with a simple 'format' tag like 'interpress' or 'postscript' or 'tiff' isn't adequate for most purposes. Standards evolve over as short as a 5 year period; even the method of internal tagging standard versions changes, and certainly, it is impossible to rely on in-band version information for all formats. I have more to say about the problem of 'external references' but I'll save that for another message. It would be nice to have a calm discussion about possible solutions to these problems & hope you will forgo future sarcasm. Thanks, Larry