Make HTML a real SGML application

I don't understand the terrible resistance to allowing (encouraging) HTML files
to contain SGML prologues and using the power implied by the existence of
that to achieve useful results.  Most of my serious HTML *already*
has a <!DOCTYPE section; I just have to run everything through SPAM
before I put it out.  The standard HTML DTD can contain some of the popular
notations; if you want to do anything funky, you have to embed some funky
syntax.  OK.  And the problem with this is...?

Why is a concept that comes from SGML always presumed "too hard" but some
random half-backed hack considered "easy enough for the masses"?  
Why is "<!--#include" easy enough for the masses to understand, but "<!ENTITY
foo SYSTEM" is too hard?  Why is long distance naming in "<A NAME=foo>...<A
HREF="#foo">" easy enough for the masses to master but that in 
"<!ENTITY foo...>...&foo;" too hard?  Why is "// <!-- ... // -->" easy enough 
for the masses to understand but "<![ CDATA [...]]>" too hard?   Why do we have
to put up with people inventing "<!--XXX IFDEF FOO-->...<!--XXX ENDIF-->" but
refusing to encourage "<![ %FOO; [ ...]]>" which does the job just as well, 
and can be processed by standard tools?

I think it's time to fish or cut bait: if HTML is to be an SGML application, 
use the features of SGML that are required to make it workable.  There is
much I would have changed about SGML if I had been its inventor, but the
fact is that it is here, it has solutions to a lot of these problems, and
if HTML is an SGML application a lot of nice tools can be used to handle it.
Tracking changes from version to version of HTML with these tools becomes a
matter of dropping in a new DTD instead of hacking up the tool to understand
the siginifance of some new semantics embedded in comments or some special
handling required for the FOOBAR element.  It is very clear to me that we
cannot go much further without putting (allowing, defaulting, supporting) the
SGML prologue into HTML. 

In particular:
    NOTATION could be used quite nicely for both SCRIPT and MATH (NOTATION=TeX,
anyone?) It would allow for direct experimentation with other scripting
notations. Parameter ENTITYs (particularly if you support URL SYSTEM
identifiers) allows you to very neatly encapsulate common boilerplate or
decorations and ease maintenance. 

While we're at it, can't we at least have a sentence somewhere official
encouraging support of processing instruction syntax instead of random comment
hackery?  Please?  

                -- Mary

Mary Holstege, PhD  
Chief Technologist, Online Engineering
KnowledgeSet Corporation
555 Ellis Street                    Tel: (415) 254-5452
Mountain View, CA 94043             FAX: (415) 254-5451

Received on Tuesday, 30 July 1996 12:32:21 UTC