- From: Arjun Ray <aray@q2.net>
- Date: Sat, 2 Oct 1999 02:23:48 -0400 (EDT)
- To: W3C HTML <www-html@w3.org>
Some recent threads have raised issues regarding the doctype declaration in HTML documents (e.g. Bert Bos' mention of their omissibility per RFC 1866, and my citation of a usenet post by Dan Connolly clarifying that internal subsets are ostensibly verboten.) I find the relevant sections of the 4.01 spec (which seems to be the same as 4.0 in this respect), 7.1 and 7.2, quite unhelpful, if not seriously misleading. By confusing an SGML syntactic function with an HTML semantic function, the spec has made a mystical incantation out of the doctype declaration. 1. Section 7.1 states : An HTML 4.01 document is composed of three parts: : : 1. a line containing HTML version information, : [...] and Section 7.2 explains this as : A valid HTML document declares what version of HTML is used in the : document. which looks fine, merely as a desideratum, but then there's this: : The document type declaration names the document type definition (DTD) : in use for the document (see [ISO8879]). Unfortunately, this statement - as an assertion about naming - has *no* basis in ISO8879. Moreover, in the relatively obvious semantic intent, it is flat out wrong. If a normative reference to ISO8879 is to be invoked at all, then at the least it needs to made very clear that, for the purposes of HTML alone, neither required nor sanctioned by ISO8879, certain extra application specific conventions are being mandated. This is because, per ISO8879, it is *not* a function of the doctype declaration to identify a "version", much less do so specifically in the form of a tactically convenient FPI with a public text class of DTD. 2. From ISO8879 Clause 4 "Definitions": | 4.103 (document) type declaration: A markup declaration that formally | specifies a portion of a document type definition. | NOTE - A document type declaration does not specify all of a document | type definition because part of the definition, such as the semantics | of elements and attributes, cannot be expressed in SGML. [...] | 4. 105 document (type) definition: Rules, determined by an application, | that apply SGML to the markup of documents of a particular type. | NOTE - Part of a document type definition can be specified by an SGML | document type declaration. Other parts, such as the semantics of | elements and attributes, cannot be expressed formally in SGML. [...] The basic point is that the purpose - indeed, the only purpose - of a doctype declaration is *syntactic*: to incorporate the machine-processable part of a document type definition (the declaration subset.) This subset is logically and syntactically an integral part of the document: it is needed in order to complete a parse according to SGML rules. (That's why the WebSGML TC has made doctype declarations optional, for cases where no information beyond the instance data is needed in order to complete an unambiguous parse.) That part or all of the definition may come through an external reference (analogous to #include in C) is irrelevant. Nowhere in the HTML spec is this syntactic function specifically pointed out. Its importance lies in the fact that, only for SGML conformance, there is a necessary relation between the declaration subset and the instance markup: they must be mutually consistent. So, if there is to be a declaration subset at all, it must describe the markup actually used in the document. There is no ISO8879-sanctioned reason to have a doctype declaration (for the DTD it incorporates) at all, otherwise. Of course, this leaves open the real issue, which is how to convey the *semantic* import of a version specification. Unofrtunately, ISO8879 doesn't provide a way. All we know is that the doctype declaration definitely does not qualify. <URL:http://www.deja.com/=dnc/getdoc.xp?AN=325927738> Arjun
Received on Saturday, 2 October 1999 01:43:28 UTC