Re: HTML should not be a file format, but an output format

BruceLeban@akimbo.com
Sat, 22 Mar 1997 20:59:39 -0500 (EST)


From: BruceLeban@akimbo.com
Date: Sat, 22 Mar 1997 20:59:39 -0500 (EST)
Message-Id: <199703230159.UAA21530@mail.internet.com>
To: www-html@w3.org
Subject: Re: HTML should not be a file format, but an output format


>From:	devnull@gnu.ai.mit.edu (nemo/Joel N. Weber II)
>2) Globetrotter has to store the document in some format, right?  My opinion
>   is that HTML is essentially a standard, open format.  What happens if
>   I write something in Globetrotter today, and ten years from now I have
>   some Globetrotter files, but I no longer have a computer that can run
>   Globetrotter?

>From:	papresco@calum.csclub.uwaterloo.ca (Paul Prescod)
>And what about in the meantime? What if I decide not to upgrade to the
>new version, or if Akimbo goes out of business? Is my data stored in an
>open, ASCII-based format like XML or SGML, or in something proprietary
>and binary that I will not be able to work with?

Yes, just like a word processor stores the doc in some format. Your 
document is stored in Globetrotter's own format AND in HTML. You can 
always edit the HTML directly if you decide not to use Globetrotter 
anymore.

Postscript is a standard, open format but no wp uses that as its internal 
format for good reason. Globetrotter documents store lots of things that 
you can't or don't want to store in your HTML files. For example, 
Globetrotter has posted notes that you can make private notes to yourself 
about whatever. You certainly don't want those readable to anyone who 
knows about the view source command. (Although I've seen such things in 
actual documents on the web.) It stores pictures in their original 
formats, converting them to GIF or JPEG when you publish. That way you 
can go back to the original application you used to create the picture 
without problems. (If you use the same argument about pictures that you 
use about HTML, of course you'd only use graphics programs that use GIF 
or JPEG as their internal format.)

You can of course use HTML as an interchange format. This loses 
information, but no more than you'd have lost if you created the document 
in HTML in the first place. Just like you can always scan a printed 
document back in if you have a disk crash. Or if you prefer, you can edit 
the printed document by hand using correction fluid and a typewriter. OK, 
that's a bit sarcastic: the point is that people don't use typewriters 
because they're afraid that using a word processor is "risky".

>From:	devnull@gnu.ai.mit.edu (nemo/Joel N. Weber II)
>   In particular, Globetrotter rejects document=web page. After all, no one 
>   edits word processing documents with each page in a separate file. Can 
>   you imagine anyone believing that was the *right* way to do it? A single 
>   Globetrotter document (1 file) can publish many different HTML pages 
>   (many files) on the web.
>
>That's not really a valid analogy.

Why not? The structure of a web site is arbitrarily divided up into 
multiple files. Each page and each picture must be in a separate file. 
Server-side image maps have to be in separate files. CGI scripts have to 
be in separate files. Is this the best way to edit a site?

>One page = one file on the web because a web page can be as long or as short
>as is desired.

Many web sites have > 1 web page. A Globetrotter document is a site, not 
a page. Yes, "One page = one file on the web", but you don't have to edit 
a single web page at a time.

>There's a compelling reason to not have one file corrrespond to one printed
>page: if I insert another word in the middle of my document, I want words
>to spill over to the next page automagically.  Keeping the whole document
>in one file makes sense for this.

There are  compelling reason to edit an entire site at once. Making a 
change like inserting a new section in the middle, breaks the internal 
prev/next links in other pages.

>But I don't see why one file == one page ever causes lossage on the web.
>But I don't see why using HTML as an internal format causes problems.

I said nothing about lossage on the web. And it's not so much that there 
are problems with editing HTML directly as there are disadvantages and 
limitations. For example, in Globetrotter small caps is a true style. It 
doesn't exist in HTML. When Globetrotter publishes a document it writes 
the correct HTML to produce the desired result. That's the simplest 
advantage. Take a look at the advantages document on our web site for 
more.

>From:	papresco@calum.csclub.uwaterloo.ca (Paul Prescod)
>Your product would also be more impressive if it produced valid HTML.

I don't want this thread to turn into defending/attacking Globetrotter, 
but I will make one more statement in our defense. The HTML produced by 
Globetrotter passes a more pragmatic test: it produces the desired 
results in every browser we've tested with.

I think everyone recognizes that all browsers ignore tags and attributes 
they don't recognize. This meta rule is essential to backward 
compatibility. If you apply that rule you'll find that Globetrotter 
generates perfectly good HTML that does what it's supposed to. I have no 
idea why this rule isn't in the HTML standards, since everyone knows damn 
well it exists.

    --- Bruce Leban
    Akimbo Systems
    http://www.akimbo.com/globetrotter
    Publish on the web without learning HTML! (Really.)