Re: Why a DTD for HTML 3.2? (fwd)

Harold A. Driscoll (harold@driscoll.chi.il.us)
Wed, 22 May 1996 14:09:57 +0000


Message-Id: <2.2.16.19960522140957.3f4fdea0@pop.interaccess.com>
Date: Wed, 22 May 1996 14:09:57 +0000
To: marc@pele.ckm.ucsf.edu (Marc Salomon)
From: "Harold A. Driscoll" <harold@driscoll.chi.il.us>
Subject: Re: Why a DTD for HTML 3.2? (fwd)
Cc: www-html@w3.org

At 21:15 21/5/96 -0700, Marc Salomon wrote:
>megazone wrote:
>|Validators are SGML based and need a DTD to work.  Validator's are vital to
>|good code.  Some HTML editors have a core of SGML technology for use in 
>|authoring and can be 'upgraded' with a new DTD.
>|And there are browsers than use SGML parsers as their core code.
>
>Applications that can index the volumes of or render current variant non-
>conforming web content must be liberal in what they accept for input, mostly 
>with forgiving (read: non-conformant) "parsers."  

Browsers being liberal about what they accept is a double-edged sword.
Without the other half of the "Internet robustness principle" (being
conservative about what you provide) we've an engineering disaster. (The
Netscape 1.x handling of mismatched quotes illustrates an extreme case of
this.) Relatively strict authoring tools are important, among other things
to restore some balance of engineering discipline.

>Authoring tools validating content coded to a poor DTD can easily produce 
>impoverished yet syntactically valid markup.  The goal is to assemble a 
>good DTD instead of formally attaching fluff to an already compromised DTD.

Authoring tools don't of themselves produce quality--they can't. Rather they
strive to make it easier for creative authors to produce quality
products--and often somewhat harder to eschew quality.

>The proposed 3.2 DTD is an arbitrary subset of "current practice."  

Yes, it is "an arbitrary subset," but certainly a considered subset. I think
most of us would prefer a more "perfect" DTD. But that has not been
forthcoming--look at how bogged-down in [insert polite words here] the IETF
HTML WG got, while the industry advanced forward (in several directions)
with limited influence from their standards activities.

>So declaring an admittedly short-lived DTD that defines a selective subset of 
>current practice does not solve the problem of the unindexability of the 
>significant body of content unfortunate enough to contain markup outside that 
>subset.  

Doesn't every student of computer science know that it is impossible to
write a program (algorithm) which can analyze _any_ program (algorithm). But
it _is_ possible to write programs which can analyze some useful subset of
programs.

Anybody who thinks that HTML 3.2 will solve all of the problems of the HTML
world is kidding themselves. Will it solve many of them?--likely. Will it
cause a few?--likely.

>And if it is to be superceded in the short-term, what bridge function 
>does it serve?  This seems more like a move to define the standards to the 
>implementation or to justify box-side claims of standards compliance than a 
>collaborative product of the best minds in information science.

Let's face it, HTML has become a presentation language, as well as a markup
language. With the practical short term needs having been allowed to become
the cart driving the horse. If nothing else, someone needs to follow the
horses with a shovel, before we can have a clean pavement on which to work.

>I don't mind syntactic sugar (with my java disabled, au lait, sil vous plait), 
>but we already saw how much fun cobbling a DTD out of existing (bad) practice 
>was.  Doing that again and consecrating it with standards status doesn't do 
>much as an interim measure to extend HTML for those of use who wish to provide 
>truly rich, structured content on the web w/o strict SGML clients.  What good
>does <DIV> do me structure-wise without CLASS?

What does the DTD do for us was the original question...

    I see it does three important things...

    * It provides a disciplined notation for describing HTML with
considerable rigor. Let's take the infamous <CENTER> element. From the
Netscape documentation, we've only a very rough idea of how it can (and
can't) be used. This hurts authors in two ways--they don't know what to
expect from a browser--nor will various browser authors likely implement it
the same way.

    * It (the 3.2 project) defines a subset of "current practice" which HTML
authors can use, with a reasonable (and increasing) expectation that their
creations will be processed reasonably by most browsers.

    * It provides a DTD which becomes a focus for development tools. Two
important aspects of this are validation tools and HTML generation tools.

In order to be successful, I see that 3.2 needs to honor two principles:

    + It needs to limit itself to established practice. It can't be (or be
perceived to be) a vehicle which allows "pet feature ideas" to be slipped in
for the ride, circumventing other standards processes; and

    + It needs to encompass established practice for HTML, not just for what
is used by certain popular browsers.

HTML 3.2 may still need some work. Let's get it there. Which will clear up
enough pressing problems that it'll allow other (often more interesting)
ones to be addressed.

Standards activities are the balancing of various forces. Much like a bridge
over a river. It likely isn't exactly where you'd want it to be. But it sure
beats not having one.

/Harold
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Harold A. Driscoll                       mailto:harold@driscoll.chi.il.us
#include <std/disclaimer>      http://homepage.interaccess.com/~driscoll/