Re: rethinking the HTML DTD.

Dan Connolly (
Tue, 14 Jul 92 17:25:56 CDT

Message-Id: <>
To: (Tim Berners-Lee)
Subject: Re: rethinking the HTML DTD. 
In-Reply-To: Your message of "Wed, 15 Jul 92 00:03:56 +0700."
             <9207142203.AA02008@ > 
Date: Tue, 14 Jul 92 17:25:56 CDT
From: Dan Connolly <>

Ok, so we really do want to use SGML. Good. I agree. I just
wanted to hear from the WWW community.

>You say HTML is not SGML.  It is true that the HTML generted by the NeXT editor
>is not good. (example, lack of quotes around attributes which need them.)
>Hwoever, the current parser wil parse real SGML. 
The biggest problem with HTML files is that they have only 1 of the 3
basic parts of an SGML document: the SGML declaration, the prologue,
and the instnace. HTML documents only have the instance. It's legal
to omit the SGML declaration -- there's a default. But you've got
to have a prologue, or you end up with a non-standard way of infering
the prologue (for example, every WWW client infers the DTD described
in "".)

So if we're commited to SGML, let's start putting something like


at the front of every HTML file (we don't have to store it in the
file -- servers that distribute HTML could generate it on the fly.)
And let's put _some_ kind of DTD there.

>In the future, the web will inclued more complex DTDs, and dynamically
>loaded DTDs, and people will want to use the same parser for it.
Interesting! There are plans to support more than one DTD!
This makes SGML a clear winner.

>So I feel RTF would be a backward step. It is true that the current
>W3 software is at a point level with RTF rather than general SGML.
>But why tie ourselves to that point?

I guess that's what I wanted to hear: that the goals of WWW and the
features of SGML really _do_ have a lot in common, but the current
implementation doesn't support many of them.

Just to make sure I've beat this horse to death: let's begin to
formalize HTML and validate existing HTML documents before the
distance between HTML and SGML gets too big.


p.s. I'm working on a DTD that reflects the structure of most existing
word-processor documents: a sequence of paragraphs (maybe broken
into flows, sections, or whatever). I'll have RTF and MIF translators
for the DTD when it's ready. Maybe HTML2 can use some of the features --
the low level character-set related features, anyway.