- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Sat, 28 Sep 1996 10:25:19 -0400
- To: w3c-sgml-wg@w3.org
An Odd Idea =========== At first, the Good SGMLer in me said that DTDs should be absolutely required during authoring and that every document should validate to some DTD (and I stated that on this list a few weeks ago). But then, I thought a little more about it, and I'm no longer convinced that that is the case. DTDs are absolutely necessary for most applications where SGML is used today. SGML is almost always used today to enforce conformance to a document structure. Since you couldn't, before XML, do much useful stuff with SGML documents without DTD-specific coding, a document without a DTD would be basically useless. But in the age of XML/DSSSL you can deliver it online, in print and fulltext index it. In other words, all of the things people expect to do with proprietary formats like Word for Windows and PDF and with trivial formats like HTML and RTF. If we loosen the requirement for a DTD, XML could do everything these formats do without making the creation process of these documents more expensive. The Advantages ============== For instance, let's say Jane Author is working in Word for Windows. The document she is creating does not conform to any DTD I know of. When its done, however, she wants to deliver it as XML. Why? * Because XML is widely supported (we hope). * Because XML is compact (we hope). * Because XML will preserve whatever structure exists in the document (and Word allows quite a bit). * Because XML is more portable, more device-independent and more widely supported than Word for Windows format. * Because XML is "open" and standardized. * Because XML is easy to full-text index. So doesn't it make sense for her to do a "save as XML" and get a structured, portable, device-independant XML document for delivery on the World Wide Web? I would expect the XML document to have one element for each paragraph and character style. This document may not conform to any existing DTD, but it might still take advantage of all of the other benefits of XML described above. The Alternative =============== If we require a DTD of this author in this situation: a) she will decide that encoding in XML is too much work and give up (and they will lose out on its non-validation related benefits) OR b) she (or Word) will create a trivial DTD that is of no value to anyone, and actually ends up obscuring the fact that this document is NOT MEANT TO CONFORM to any particular application of XML OR c) she will throw away data by encoding the document in a very general DTD like HTML (which is equivalent to not using a DTD at all) or a highly non-prescriptive DTD like TEI (which is equivalent to creating a trivial DTD) d) she will shoehorn their document into a DTD that is completely wrong for it. In other words, the cause of Good Documents will probably not be advanced at all. What does an author lose by not requiring a DTD for every document? a) the document type is no longer self-describing. But how meaningful is a document type descripton "<HTML>" or "<TEI.2>"? They tell you very little about the type or structure of the document. (contrast this with the next point) b) the document format is no longer self-describing, so code cannot know the semantics of the markup. That is a fairly big loss. In response, I would say that there are thousands, perhaps hundreds of thousands of documents in the world that were never meant to be processed by anything beyond a browser/printer and a full-text indexer. These authors will never research and choose a correct document type. Why shouldn't we make a standard that encompasses their needs too? What do we lose? c) the document no longer conforms to some external standard. But as mentioned before, the author may not care. But what about Rigour? ====================== Of course there are massive benefits to standardization. And in many situations it makes a lot of sense to _require_ standardization. But merely requiring a DTD does not enforce standardization, because there are so many trivial and useless DTDs in the world (and an infinite number of them still to be written). I believe that the structural quality of the average XML document will actually GO DOWN according to my proposal. A lot of quasi-structured documents will be put on the web as XML. But isn't it better that they be in XML (which allows fine-grained descriptive markup) than in HTML, RTF or PDF? I hope that under my proposal, the structural quality of the _average document_ will go UP because so many documents that would have had their structural features buried in a proprietary format will be encoded in XML instead. I think that this is what we should be aiming for. "Encoded in XML/SGML" was never a guarantee of quality structural markup. This situation is no better nor worse under my proposal. But aren't DTDs integral to structural markup? =============================================== My reading of Annex A.3 indicates that DTDs were primarily intended to be used for markup minimization. Goldfarb says: "Document type definitions have uses in addition to markup minimization." (a slight understatment =) ) But what about XML editors? =========================== As demonstrated by my example above, I am trying to blow open the definition of an XML editor to allow all standard word processors to become XML editors without the expense of incorporating SGML parsers and validators. XML could be the "trivial" ASCII export format of all of these tools. SGML-editors would of course have a major place in the production of XML. They would be known as "DTD-validating" XML editors. In other words, when you DID care about conforming to a particular DTD standard, you would use an SGML editor. And many structure-serious authors would want the control over structure that an SGML editor would allow. After all, allowing Word for Windows to export something that has SGML-style tags does not turn Word into a great tool for creating structural documents. There would also be a huge market for "porting" these "trivial dumps" into DTDs for incorporation into an SGML system (just as people currently port proprietary formats into SGML, except that the other formats would be easier to parse). Increasing the number of descriptive-encoded documents in the world can only help the SGML vendors (as the HTML experience has shown). Could you summarize? ==================== I think that we should only require DTDs when DTDs are required. Which means that the requirement for DTDs should not be in the XML standard but in the standards built on top of it ("XML applications"). Paul Prescod
Received on Saturday, 28 September 1996 10:30:49 UTC