Re: C.4 Undeclared entities?

On Sat, 19 Oct 1996 22:14:49 -0700, bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
wrote:

>[Charles Goldfarb:]
>
>| In nature, a document is ALWAYS an instance of a document type
>| [...]
>| There is always a DTD, it is a law of nature. 
>
>This assertion (upon which most of your argument seems to hinge) is so
>strange that I hardly know how to approach it.
>
>Documents do not exist in nature.  They are human constructions.
>Whether they have a DTD or not depends on whether a human being
>has constructed one.

O.K. "In nature" *was* a bit poetic, but my assertion is still true. However, I
should clarify that by "DTD" I mean the abstraction called a "document type
definition", not SGML's declarations that describe the portion of a DTD that
SGML knows how to deal with. (Most SGML practitioners use "DTD" to mean what I
called the "external subset" in my posting.)

In other words, every object is an instance of a class; therefore, every
document is an instance of (at least one) class of documents, known as a
"document type". A DTD is the description of the properties of members of that
class. 

The foregoing statements are true even in the absence of SGML. SGML, however,
provides a declaration syntax for making a portion of the DTD explicit and
processable.

>If we substitute "C program" for "document" then your argument would
>be that every C program implies a DTD.  There may be some very odd
>sense in which this is true, but that doesn't make an implied DTD part
>of the ANSI C language specification.  It doesn't make it part of the
>XML specification, either.

Actually, the DTD for C programs is *explicit* in the ANSI C language
specification -- it is the rules for a well-formed C program. To me, "C
programs" is a particular document type (i.e., a particular class of documents)
and "C program" is an instance of that type. 

>If what you are saying is that for every document, you can imagine a
>DTD to which that document conforms, then you are understating the
>case.  For any document, one can imagine an infinite number of DTDs to
>which it conforms.  You are free to imagine as many of these DTDs as
>you like.  That still doesn't make any of them part of the XML
>specification.

That's true. I was referring to the DTD that most closely accounts for the
particular instance. For want of a better term, let's call it the "maximally
precise" DTD. For example, if an instance were:

<doc><p>data</p><p>data</p></doc>

The SGML representation of the maximally precise DTD would be:

<!element doc (p,p)>
<!element p      (#pcdata)>

not

<!element doc (p+)>
<!element p      (#pcdata)>

The latter would be just one of an unbounded set of guesses, as you say. The
former has the property that it describes this instance exactly, and no other.

Note: I am aware that the SGML representation of the DTD allows the data to
vary, but that is just a weakness in the current syntax. When lexical modeling
is used, the maximally precise DTD could be represented as:

<!element doc (p,p)>
<!element p      (#lextype "data")>

>Even if we grant for purposes of argument that the infinite number of
>DTDs to which any document is conformant have some kind of existence
>(in the mind of God, say), their ontological status is equal; there is
>nothing to distinguish any one of this infinitude as more real than
>the rest of them.

The maximally precise DTD is more real than the rest of them because it is the
only one that can be justified. In the above example, we can't use p+ because
there is no evidence that a "doc" exists with other than two "p"s in its
content.

>It may be that there is a practical purpose to be served by inventing
>the idea of an implied DTD and specifying rules whereby it is to be
>constructed from the document, but you can't make such a thing pop out
>of the void by saying that nature will provide it every time a
>document is created.

My point is stronger than that, Jon. The maximally precise DTD is the one that
pops out of the void. If XML wants something different, it had better spell out
the rules for constructing it, rather than leaving it to the application or
stylesheet or browser to do so.
--
Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
           13075 Paramount Drive * Saratoga CA 95070 * USA
  International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
 Prentice-Hall Series Editor * CFG Series on Open Information Management
--

Received on Sunday, 20 October 1996 14:30:19 UTC