Re: HTML should not be a file format, but an output format

Paul Prescod (papresco@calum.csclub.uwaterloo.ca)
Sun, 23 Mar 1997 04:45:35 -0500


Message-ID: <3334FBBF.2EF@csclub.uwaterloo.ca>
Date: Sun, 23 Mar 1997 04:45:35 -0500
From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
To: "nemo/Joel N. Weber II" <devnull@gnu.ai.mit.edu>, www-html@w3.org
Subject: Re: HTML should not be a file format, but an output format

nemo/Joel N. Weber II wrote:
> You raise some interesting issues.
> 
> They seem valid.
> 
> How would you like to solve those problems?

I've already solved this problem. And guess what! I'm not selling
anything: I use all free software (not, unfortunately, all available for
the Mac, yet). The solution is to encode your documents in a storage
format that reflects the needs of the documents: one that reflects the
internal structure of the documents and is based on SGML or XML. My
documents correspond closely to the TEI structure, so I use that. If you
are doing computer software documentation, DocBook may be closer to your
needs. If you are doing something completely different, like Peter
Murray-Rust, who is applying SGML to the problem of computer modelling
of chemical molecules, then you start with the closest existing DTD and
change it to suit your needs. Soon we will have XML which will not
require you to explicitly describe the structure of your documents in
advance, using a DTD, so the step of creating an actual DTD will be
optional (but recommended -- validation is your friend).

So you encode each logical document in one file. It is usually
relatively clear where to make that break, but not necessarily. I am not
yet 100% clear what is logically a document. It is tempting to call an
entire site a document, as Akimbo does, but I'm not yet 100% clear what
a "web site" is, either. And anyhow, some of them are certainly too
large for processing all at once in any meaningful manner.
http://www.yahoo.com is certainly a web site, but I don't want to wait
for it to load into my text editor! So anyhow, at some point you make a
break and say these things are this document and those things are that
document. 

Now you have them in a format that is tailored for them, with a file
breakdown that is optimal for them and you use Jade (or another DSSSL
processor that supports HTML) to filter them into HTML, by mapping tags
from SGML tags to HTML tags, and to break them up into the optimal
viewing "chunks" that your audience will appreciate. The mapping device
is called a "style sheet" (although it is stretching the usual use of
that term), and you can execute two different (or n different) style
sheets on the same data. The first could make the "holy scroller"
version and the second could make the "little chunks" version. You could
even make Netscape and Microsoft versions. At the same time you can make
a DSSSL stylesheet to filter your documents to high quality print
through RTF or TeX. Not a single byte of your original document needs to
change (once it is in SGML) to make high quality print or web versions.

If the only feature of HTML that does not fit your document's needs is
the one file per page problem, then you can do as I have done in the
past, and simply make a DTD that allows multiple HTML "elements" and
then write a DSSSL stylesheet that breaks them up and spews out the
smaller files. Jade will fix up links for you automatically. You will
probably also want stylesheet code to generate headers, footers and the
navigation buttons between pages. Jon Bosak already has a DSSSL
stylesheet for HTML so it would probably work fine for this
HTML-enhanced. And since the original source document does not have the
redundant headers, footers and buttons, they would not show up in your
print version.

Note that writing a non-trivial stylesheet is actually programming. I
don't recommend it, therefore, as a replacement for tools like
GlobeTrotter. I recommend, instead, that they provide a graphical front
end to the stylesheet creation process that allows end users to keep
their data in completely ISO standard formats: SGML and DSSSL, both of
which (under the guise of XML and DSSSL-XML) will soon be W3C standard
formats too.

So what does this stylesheet mapping process look like? Well, in simple
cases it is just like this:

(element para (make-html-element "P")) ;creates an HTML paragraph

This maps the para element in the input SGML document to a "P" element
in HTML. 

If you are using an HTML-enhanced DTD, then you will not need many such
mappings because most can be handled by a default mapping that just maps
input to output. If you are using some other DTD, then you will have
many statements like the one above. 

To make a new web page for each of occurance of an element, you make a
"scroll" flow object:

(element div (make scroll)) ; for something like TEI
or
(element html (make scroll)) ; for something like HTML-enhanced

Of course actual useful code tends to be a little more complex, because
you want to do more interesting things:

(element div (make scroll scroll-title: 
                  (data                 ; gets the data out of an
element
                      (node-list-first    ; gets the first element
	                  (select-elements ;searches for particular elements
	                      (descendants section) ; searches for "children
elements"
	                       "TITLE"))) ; with the GI "TITLE"

As each scroll is created, this code gets a list of the div element's
children, looks for the first one labelled "TITLE" and gets the contents
of that as the web page's <TITLE></TITLE>. Note that by removing the
duplication of <TITLE></TITLE> and the document's "first" <H1> (in HTML)
I have eliminated the possibility of them getting out of "sync". Of
course if this is not what you want, however, you are not at all
restricted to making them the same: you can continue to call one element
<TITLE> and one <H1>. That's the power of SGML: you design your own
document format to support the structure of your own documents.

I have a DSSSL tutorial (mostly oriented toward print, but also
applicable to online publishing) at
http://itrc.uwaterloo.ca/~papresco/dsssl/tutorial.html . 

This is a simple long scroll document. The source is in TEI-Lite. I used
essentially the same DTD and a slightly variant style sheet to make
http://www.cohdn.ca , which is a website of many small documents. For a
third twist on the same theme, I used another slightly variant
stylesheet to make a newsletter: http://www.cohdn.ca/news/1-1/6.html .
These documents are not yet "live" (they are internal prototypes). While
you are there, however, please read the important information at the
last URL so that you can participate in the boycott of Shell Oil.

 Paul Prescod