Message-Id: <9207141839.AA10602@pixel.convex.com> To: www-talk@nxoc01.cern.ch Subject: rethinking the HTML DTD. Date: Tue, 14 Jul 92 13:39:53 CDT From: Dan Connolly <connolly@pixel.convex.com> I have been troubled by the fact that HTML documents look like SGML documents, but technically, they are not. So I have tried to come up with a DTD that captures the features of HTML. I have come to the conclusion that HTML has very little structure, and that this is by design. I am beginning to wonder how much the needs of WWW have in common with the features of SGML. It seems to me that SGML is the technology of choice when you have a community of information consumers and producers that share a common structure. e.g. the construction industry might use SGML to exchange bill of materials, parts lists, inventories, etc. The SGML parser would be used to verify part numbers, make sure every widget has a corresponding gadget, etc. The WWW project is a form of electronic publishing, however, and publishing is a natural application of SGML. But the value of SGML is that you can verify the structure of the text. A publisher can specify in his DTD the format of references, bibliography entries, the placement of the abstract, etc. The WWW project has no such editorial policies to enforce. The editorial policies set forth in the HTML tag set are things like "you can have a title, if you want, and we'll keep it visible for the user; you can have headings and paragraphs and glossaries and lists and menus, and as long as you use them in pretty much the traditional way, they'll be formatted reasonably. And you can have anchors -- references from/to other documents." The question that recently came into my mind is: why is the WWW project defining such a tag set? The practical answer is that the NeXT implementation has a nifty editor, and we'd like to be able to write nicely formatted documents and display them nicely on nice terminals and simply on simple terminals. Honestly, for that purpose, RTF is a more mature technology. The NeXT has extensive support for RTF, and the Mac and the PC have some support. I think all we're lacking is public implementations of RTF->ASCII, RTF->Postscript, and RTF->X Windows renderers. MS Word and NeXT edit would be fine editors. Really, for the kind of casual documents the WWW project deals with, SGML is not a good match. Who really uses all the "format independent" features of WWW? I haven't seen anything that the RTF stylesheet features can't handle. Unless we want some part of the WWW system to verify the structure of documents, why are we using SGML (and using it poorly)? Granted RTF doesn't have very good hypertext and multimedia features, but that's what the WWW project is all about: experimenting with hypertext and multimedia. We could standardize multimedia RTF conventions as well as we have done for SGML. Comments? Dan