Message-Id: <9207142225.AA07409@pixel.convex.com> To: timbl@nxoc01.cern.ch (Tim Berners-Lee) Cc: www-talk@nxoc01.cern.ch Subject: Re: rethinking the HTML DTD. In-Reply-To: Your message of "Wed, 15 Jul 92 00:03:56 +0700." <9207142203.AA02008@ nxoc01.cern.ch > Date: Tue, 14 Jul 92 17:25:56 CDT From: Dan Connolly <connolly@pixel.convex.com> Ok, so we really do want to use SGML. Good. I agree. I just wanted to hear from the WWW community. > >You say HTML is not SGML. It is true that the HTML generted by the NeXT editor >is not good. (example, lack of quotes around attributes which need them.) >Hwoever, the current parser wil parse real SGML. > The biggest problem with HTML files is that they have only 1 of the 3 basic parts of an SGML document: the SGML declaration, the prologue, and the instnace. HTML documents only have the instance. It's legal to omit the SGML declaration -- there's a default. But you've got to have a prologue, or you end up with a non-standard way of infering the prologue (for example, every WWW client infers the DTD described in "http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html".) So if we're commited to SGML, let's start putting something like <!DOCTYPE HTML SYSTEM "http://info.cern.ch/hypertext/WWW/MarkUp/html.dtd"> at the front of every HTML file (we don't have to store it in the file -- servers that distribute HTML could generate it on the fly.) And let's put _some_ kind of DTD there. >In the future, the web will inclued more complex DTDs, and dynamically >loaded DTDs, and people will want to use the same parser for it. > Interesting! There are plans to support more than one DTD! This makes SGML a clear winner. >So I feel RTF would be a backward step. It is true that the current >W3 software is at a point level with RTF rather than general SGML. >But why tie ourselves to that point? > I guess that's what I wanted to hear: that the goals of WWW and the features of SGML really _do_ have a lot in common, but the current implementation doesn't support many of them. Just to make sure I've beat this horse to death: let's begin to formalize HTML and validate existing HTML documents before the distance between HTML and SGML gets too big. Dan p.s. I'm working on a DTD that reflects the structure of most existing word-processor documents: a sequence of paragraphs (maybe broken into flows, sections, or whatever). I'll have RTF and MIF translators for the DTD when it's ready. Maybe HTML2 can use some of the features -- the low level character-set related features, anyway.