- From: Charles F. Goldfarb <Charles@SGMLsource.com>
- Date: Sun, 20 Oct 1996 00:12:23 GMT
- To: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
- Cc: w3c-sgml-wg@w3.org, bosak@atlantic-83.Eng.Sun.COM
On Fri, 18 Oct 1996 20:42:30 -0700, bosak@atlantic-83.Eng.Sun.COM (Jon Bosak) wrote: >[Charles Goldfarb:] > >| Questions like this are more easily analyzed if we avoid the confusing >| term "optional DTD". A missing DTD is really an implied DTD (like >| SGML's implied SGML Declaration) in which all element types have mixed >| content consisting of "(#pcdata | any element type)*" and all >| attributes are CDATA #REQUIRED (except for special conventions for ID, >| etc. that we might adopt). > >There is no "implied DTD" of this nature in XML. Good. >It is possible for a well-formed XML document to be parsed in the >absence of a DTD. In such a case, the DTD does not maintain a >noumenal existence; it just really isn't there at all. Now you've got an implied DTD again. Here is the full set of SGML possibilities: 1. Full internal: A DTD for the type of document exists and all the declarations that define it are in the internal subset of the DOCTYPE declaration. 2. Full external: A DTD for the type of document exists and all the declarations that define it are in the external subset of the DOCTYPE declaration. That is, they are in an external SGML text entity that is automatically declared and is automatically referenced at the end of the DOCTYPE declaration (just before the MDC). 3. Partial internal: A DTD for the type of document exists. Some of the declarations that define it are in the external subset of the DOCTYPE declaration. That is, they are in an external SGML text entity that is automatically declared and is automatically referenced at the end of the DOCTYPE declaration (just before the MDC). The remainder of the declarations are in the internal subset of the DOCTYPE declaration. They are parsed before the external subset. In nature, a document is ALWAYS an instance of a document type, even if it is the only instance. SGML recognizes this fact and allows the DTD to be represented explicitly. (That is what makes SGML usefully different from word processors, because it permits rule-based processing.) Therefore, an XML document with no declarations in the internal subset, is really an example of case 2 (full external) in which the DTD has only one instance. Now, a DTD with only one instance is, quite literally, meaningless. That is, it can add no meaning that is not already determinable from the instance. So the notion is essentially fatuous as well. Postulating, as has been done on this list, that the "style sheet" will know what to do with unspecified attributes or references to undeclared entities JUST MEANS THAT THE DTD IS IN THE STYLE SHEET rather than in the standardized declarations used in SGML (and XML). I believe it would be useful (perhaps essential) for an XML document to be parsable without *reference* to its DTD, or with reference only to the internal subset, or both. But that is very different from saying that the DTD "just really isn't there at all". There is always a DTD, it is a law of nature. The issue is whether the DTD is represented by standardized declarations, or in a proprietary style sheet, or (as with HTML "extensions"), buried in the browser code. For XML, a standardized representation of DTDs is the only way to go. -- Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553 13075 Paramount Drive * Saratoga CA 95070 * USA International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime Prentice-Hall Series Editor * CFG Series on Open Information Management --
Received on Saturday, 19 October 1996 20:12:17 UTC