Re: Why a DTD for HTML 3.2? (fwd)

Marc Salomon (marc@pele.ckm.ucsf.edu)
Tue, 21 May 1996 21:15:28 -0700


Date: Tue, 21 May 1996 21:15:28 -0700
From: marc@pele.ckm.ucsf.edu (Marc Salomon)
Message-Id: <199605220415.VAA22473@pele.ckm.ucsf.edu.UCSF-LIBRARY>
To: www-html@w3.org
Subject: Re: Why a DTD for HTML 3.2? (fwd)

megazone wrote:
|Validators are SGML based and need a DTD to work.  Validator's are vital to
|good code.  Some HTML editors have a core of SGML technology for use in 
|authoring and can be 'upgraded' with a new DTD.
|And there are browsers than use SGML parsers as their core code.

Applications that can index the volumes of or render current variant non-
conforming web content must be liberal in what they accept for input, mostly 
with forgiving (read: non-conformant) "parsers."  Otherwise, given the vast 
non-DTD-conforming content out there, they would gain little market share for 
the content they couldn't grok.  Authoring tools validating content coded to a 
poor DTD can easily produce impoverished yet syntactically valid markup.  The 
goal is to assemble a good DTD instead of formally attaching fluff to an 
already compromised DTD.

The proposed 3.2 DTD is an arbitrary subset of "current practice."  The chance 
of strict validators choking on documents containing the set of markup widely 
deployed that was *not* included in the 3.2 DTD is rather high, so 
unindexability of a significant body of content is inevitable regardless of 
which of these bits are in some "standard" DTD.  For instance, the set of 
documents containing the Netscape first-to-market <FRAME> implementation will 
choke most rigid ISO 8879 compliant indexers/databases that use the 3.2 DTD.  

So declaring an admittedly short-lived DTD that defines a selective subset of 
current practice does not solve the problem of the unindexability of the 
significant body of content unfortunate enough to contain markup outside that 
subset.  And if it is to be superceded in the short-term, what bridge function 
does it serve?  This seems more like a move to define the standards to the 
implementation or to justify box-side claims of standards compliance than a 
collaborative product of the best minds in information science.

I don't mind syntactic sugar (with my java disabled, au lait, sil vous plait), 
but we already saw how much fun cobbling a DTD out of existing (bad) practice 
was.  Doing that again and consecrating it with standards status doesn't do 
much as an interim measure to extend HTML for those of use who wish to provide 
truly rich, structured content on the web w/o strict SGML clients.  What good
does <DIV> do me structure-wise without CLASS?

-marc