Re: HTML should not be a file format, but an output format

nemo/Joel N. Weber II (devnull@gnu.ai.mit.edu)
Sat, 22 Mar 1997 17:54:22 -0500 (EST)


Date: Sat, 22 Mar 1997 17:54:22 -0500 (EST)
Message-Id: <199703222254.RAA07910@duality.gnu.ai.mit.edu>
From: "nemo/Joel N. Weber II" <devnull@gnu.ai.mit.edu>
To: BruceLeban@aol.com
CC: www-html@w3.org
In-reply-to: <970322170004_180893866@emout15.mail.aol.com>
Subject: Re: HTML should not be a file format, but an output format

   From: BruceLeban@aol.com
   Date: Sat, 22 Mar 1997 17:00:05 -0500 (EST)

   For another perspective on "HTML should be an output format", check out 
   Globetrotter Web Publisher designed with that philosophy in mind:
       http://www.akimbo.com/globetrotter
   Of course, Globetrotter writes HTML right now because that's what the web 
   uses, but if the web were to change overnight to a new language, 
   Globetrotter users would just republish (as easy as reprinting a 
   document) and go. (After of course upgrading to the new version of 
   Globetrotter that writes the new output format.)

A couple points:

1) There is no way the web would change overnight.  From what I can see,
   HTML will probably always be needed for backwards compatibility.

2) Globetrotter has to store the document in some format, right?  My opinion
   is that HTML is essentially a standard, open format.  What happens if
   I write something in Globetrotter today, and ten years from now I have
   some Globetrotter files, but I no longer have a computer that can run
   Globetrotter?

I actually have the problem of converting AppleWorks files to something
that my Intel machines can read.  Moving the files to a DOS disk is not
a problem.  Since I have documentation of the internal format AppleWorks
uses, reading the files shouldn't be a problem.

But then there's the question of what format to generate.  Plain text
will work fine, but it destroys the formatting.  I could use TeX
or HTML as the output format easily enough.  But what would really
be optimal for the current use is creating files in the native format
of a particular proprietary word processor, for which I have no documentation.

And I wonder what I'll have to do another ten years from now.

(I've become a free UN*X user; but for some reason the rest of my
family insts that UNIX is too hard to use.)

So I conclude that something standard like HTML is a great storage format.

   In particular, Globetrotter rejects document=web page. After all, no one 
   edits word processing documents with each page in a separate file. Can 
   you imagine anyone believing that was the *right* way to do it? A single 
   Globetrotter document (1 file) can publish many different HTML pages 
   (many files) on the web.

That's not really a valid analogy.

One page = one file on the web because a web page can be as long or as short
as is desired.

There's a compelling reason to not have one file corrrespond to one printed
page: if I insert another word in the middle of my document, I want words
to spill over to the next page automagically.  Keeping the whole document
in one file makes sense for this.

But I don't see why one file == one page ever causes lossage on the web.

   Apologies if anyone thinks this message is too self-serving. We've been 
   trying to promote the "HTML should be an output format" message for quite 
   some time and it's difficult. Most people just don't seem to get it. Most 
   of the people publishing documents on the web are the early adopters who 
   have been forced to learn HTML. There's some reluctance for them to 
   accept that there might be easier ways to do things, that would deny them 
   their web"master" status. My personal frustration is how many people are 
   impressed with tools that basically just type < and > for you.

I agree that many web page editing tools are not as useful as some
people think they are.

But I don't see why using HTML as an internal format causes problems.

I also get frustrated by people who don't understand the underlying
technology and are confused when any lossage happens.  HTML is not
that hard.  Let people learn it.  Then we don't have problems with
slightly broken tools.  (Admittedly the tools should be fixed.
But how do you let people intellegenetly choose between relative
and absolute links without explaining them?)