Re: C.4 Undeclared entities?

On Fri, 18 Oct 1996 20:42:30 -0700, bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
wrote:

>[Charles Goldfarb:]
>
>| Questions like this are more easily analyzed if we avoid the confusing
>| term "optional DTD". A missing DTD is really an implied DTD (like
>| SGML's implied SGML Declaration) in which all element types have mixed
>| content consisting of "(#pcdata | any element type)*" and all
>| attributes are CDATA #REQUIRED (except for special conventions for ID,
>| etc. that we might adopt).
>
>There is no "implied DTD" of this nature in XML.

Good.

>It is possible for a well-formed XML document to be parsed in the
>absence of a DTD.  In such a case, the DTD does not maintain a
>noumenal existence; it just really isn't there at all.

Now you've got an implied DTD again.

Here is the full set of SGML possibilities:

1. Full internal: A DTD for the type of document exists and all the declarations
that define it are in the internal subset of the DOCTYPE declaration.

2. Full external:  A DTD for the type of document exists and all the
declarations that define it are in the external subset of the DOCTYPE
declaration. That is, they are in an external SGML text entity that is
automatically declared and is automatically referenced at the end of the DOCTYPE
declaration (just before the MDC).

3. Partial internal:  A DTD for the type of document exists. Some of the
declarations that define it are in the external subset of the DOCTYPE
declaration. That is, they are in an external SGML text entity that is
automatically declared and is automatically referenced at the end of the DOCTYPE
declaration (just before the MDC). The remainder of the declarations are in the
internal subset of the DOCTYPE declaration. They are parsed before the external
subset.

In nature, a document is ALWAYS an instance of a document type, even if it is
the only instance. SGML recognizes this fact and allows the DTD to be
represented explicitly. (That is what makes SGML usefully different from word
processors, because it permits rule-based processing.) Therefore, an XML
document with no declarations in the internal subset, is really an example of
case 2 (full external) in which the DTD has only one instance.

Now, a DTD with only one instance is, quite literally, meaningless. That is, it
can add no meaning that is not already determinable from the instance. So the
notion is essentially fatuous as well. Postulating, as has been done on this
list, that the "style sheet" will know what to do with unspecified attributes or
references to undeclared entities JUST MEANS THAT THE DTD IS IN THE STYLE SHEET
rather than in the standardized declarations used in SGML (and XML).

I believe it would be useful (perhaps essential) for an XML document to be
parsable without *reference* to its DTD, or with reference only to the internal
subset, or both. But that is very different from saying that the DTD "just
really isn't there at all". 

There is always a DTD, it is a law of nature. The issue is whether the DTD is
represented by standardized declarations, or in a proprietary style sheet, or
(as with HTML "extensions"), buried in the browser code. For XML, a standardized
representation of DTDs is the only way to go.
--
Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
           13075 Paramount Drive * Saratoga CA 95070 * USA
  International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
 Prentice-Hall Series Editor * CFG Series on Open Information Management
--

Received on Saturday, 19 October 1996 20:12:17 UTC