Re: HTML should not be a file format, but an output format

F. E. Potts (fepotts@fepco.com)
Sun, 23 Mar 1997 13:05:31 -0700


Date: Sun, 23 Mar 1997 13:05:31 -0700
From: fepotts@fepco.com (F. E. Potts)
Message-Id: <97Mar23.123310mst.18433@gw2.fepco.com>
To: www-html@w3.org
Subject: Re: HTML should not be a file format, but an output format

On Sat, 22 Mar 1997 19:46:07 -0700, F. E. Potts wrote:
> > SGML is a good storage medium, because it is a stable standard that
> > has the capability to be converted into "the markup language of the
> > moment".  One markup language of the moment is HTML 3.2, but HTML
> > is a moving target and must be treated as such.

On Sun, 23 Mar 1997 09:33:07 -0700, Steven Champeon ("Web Guru/Intranet
Builder") wrote:
> I must ask - what is SGML to you? I thought it was a standard for 
> defining document types such as HTML. HTML, therefore, would be an
> instance of an SGML DTD. There is no such thing as ``tagging files
> in SGML'' apart from using a specific tagset. 

What is SGML to me?  Well, I use one variant of ISO 12083:1994 to write
some of my books.  From that master document (with its file-entities,
DTD, and Catalog), I can then convert the instance into whatever format
is currently required by the job at hand.  One example of this would be
HTML 3.2 for documents that are to be published electronically on the
web at this point in time; others would be paper presentations (and for
that I would use various FOSIs for the conversion to print--I am kinda
behind the times and haven't started to use DSSSL yet).  And other uses
would be publishing the work in CD ROM if the market was appropriate,
or--because of SGML's long-term storage capabilities--formats not yet
invented, perhaps not yet even dreamed.

As to HTML, though it is an *application* of SGML--and I use it as
though it were true SGML--I have a difficult time accepting it as true
SGML for several reasons:

1). HTML is a *presentation* DTD, and basically ignores structure.
    This makes it unsuitable for long-term storage. (I know, ISO
    12083:1994 gets perilously close to presentation in certain areas
    too, such as:

      <!ENTITY % e.types "(1|2|3|4|5|6) #IMPLIED"
           -- Suggestions for emphasis types:
           1=bold, 2=italic, 3=bold italic, 4=underline, 
           5=non proportional, 6=smallcaps; if more needed 
           modify or extend this list as necessary.              -->

      <!ENTITY % l.types "(1 | 2 | 3 | 4 | 5| 6 | 7 | 8) #IMPLIED"
           -- Suggestions for list types:
           1=arabic, 2=upper alpha, 3=upper roman, 4=bullet, 5=dash, 
           6=unlabelled, 7=lower alpha, 8=lower roman; if more needed,
           modify or extend this list as necessary.               -->
      <!-- 7=lower alpha and 8=lower roman have been added to what was 
      in 12083  -->

    but those exceptions are useful for authors (as you point out in
    another message), and the lineage of this DTD goes all the way back
    to Z39.59-1988. And Z39.59-1988's lineage goes back to the original
    AAP [Association of American Publishers] efforts of 1983-1987).
    And, as for structure, well, there is no way anyone could say that
    ISO 12083:1994 ignores structure. :-)

2). HTML--unlike true SGML--is not a DTD that is commonly modified by
    the user, because the currently available UAs define what elements
    will be rendered.  This makes HTML fairly useless--as well as
    inconvenient--for large, complex documents.

3). Perhaps most telling of all, HTML is a wimpy application that is
    totally useless for large documents (such as, in my case,
    full-length books).

As to ``tagging files in SGML,'' while it appears as though you are
quoting me, I did not make that remark.

On Sun, 23 Mar 1997 11:45:19 -0700, Steven Champeon (Web Guru/Intranet
Builder) wrote:
> I miss these religious wars ;^) My life has been somewhat cheaper and
> more empty since I stopped reading comp.text.sgml...

Religious wars may be fun, but this issue is not religious.  It is a
practical matter concerning fitness for certain purposes, specifically
long-term storage and the ability to convert one's instances from a
master document into many different types of presentation formats.  For
this, HTML is a very poor choice.  HTML has many valid uses--and yes, I
use it myself--but this is not one of them.

-fep (who gave up on USENET years ago, and has never bothered to read
     comp.text.sgml :-)

--
fepotts@fepco.com
http://www.fepco.com/