Re: C.4 Undeclared entities?

Michael Sperberg-McQueen wrote:
> [Medium-long note.  Executive summary:  the discussion of 'implicit'
> DTDs has served its purpose and can be ended without damage to
> the process of designing and documenting XML 1.0.]
> The discussion of explicit and implicit DTDs has confused me a
> lot, mostly because the replies seem to be about topics very
> different from the postings to which they are ostensibly replying.

Thank you, Michael.  These summaries do much to keep me focused
and others, I suspect, who are reading 200 plus email messages
a day.  For me, the appeal to the DeadPoetsSociety is too much 
like my own appeals to Chaos Theory:  they make sense to me 
but are a clear indicator that the thread is tired.

>   - the ERB asked what to do about references to undeclared entities.
>   - Charles Goldfarb suggested they should be treated as in 8879,
>     i.e. as errors.  

This appears correct.  It can be debated where they should be declared.
Could an XML instance without *some mechanism* for declaring 
an entity do otherwise?

>     Along the way, he referred to the fact that
>     any document can be viewed as being of some type, and that
>     if that document type is not explicitly defined, one can
>     nevertheless regard it as having some definition implicit in
>     its usage.

Having a document declared as a type and having an entity 
declared to exist and having a mechanism for resolving its
location do not seem to be the same of necessity. 

> Unfortunately, Charles used the term 'DTD' for the set of rules
> governing the use of a document type ...

Use?  I won't argue 8879 legalese, but that is a strong 
term that to me, implies, semantics.  A DTD can't do that.
It can tell me if a production is valid with respect 
to hierarchy, occurrence, frequency, id/idref, etc.,
but says next to nothing, ye, nothing, about semantic use.
That is, in classical data structure fundamentals, the 
operations that can be applied formally.
> Implicit declarations are a reasonably common phenomenon in many
> formal languages (the rule in C that an undeclared identifier is
> assumed to be a variable or a function of type int is an easy
> example); the charges that they involve mysticism or magic seem way
> overblown to me.  

That is not mysticism.  It is behavior defined externally to 
the document instance.  It is defined in the C standard.  
XML can easily do this.  Gavin Nicol has made a suggestion regarding the 
catalog.  We use a *domain* file in IDE/AS for this.  Same 
thing more or less.  The application chokes if this does not

> It is easy to define the behavior of an XML
> processor working with an empty or incomplete set of declarations as
> being governed by an *implicit* set of declarations -- I can say
> that, because I have already sketched out language that does so.

Are they implicit?  Or are they explicitly declared in an
external document.  What I am asking is, resolved by rule or 
by mechanism?  Rule is the standard (XML spec).  Mechanism 
is a catalog, etc. explicitly defined as extant or an error 
is issued.
> An XML processor which has no explicit markup declarations has no
> warrant to flag any errors except those which violate some basic rule
> of XML, like element nesting or attribute quoting.  That is, an XML
> processor which has no explicit markup declarations must behave
> pretty much identically to an XML processor which has explicit markup
> declarations in which every element is declared
>   <!ELEMENT foo - - ANY >
> and every attribute is declared ATTNAME CDATA #IMPLIED.  (This is the
> form which Lou Burnard and I call the Waterloo DTD, in honor of the
> Waterloo Centre for the Oxford English Dictionary, though Tim Bray
> has pointed out with some edge that no such DTD was ever used at
> Waterloo.

Ok.  I thought Waterloo was the title of a country song 
by George Hamilton IV.  Silly me.

> It seems to me that the only point of interest here is whether
> explicit reference to some notion of an implicit DTD is useful or
> not, in explaining the behavior of a processor faced with incomplete
> or nonexistent declarations.

I think it beside the point.  I agree that in some sense, an 
explicit DTD of some form does exist be it as an 8879 DTD, 
or in the form of structures or classes in the application.
I think it desirable, even mandatory, that some form of 
DTD must be able to exist for validation.  My experience 
with large ASCII files of markup tells me without it, 
it is deuce difficult to chase down errors.  But, this 
is not the same as saying an entity has to be declared 
in the DTD.  This is inconvenient because then the DTD 
must be transmitted with the document, or the record 
of declaration must exist and be accessed.  We ran round 
this with the MID.  Notation declarations with attributes 
are also inconvenient.  That lead to xenoforms.  I can 
only quote Lee:  "ugly".  It's very hard to get around 
the definition of a formal data structure as having 
data and operations declared.

> It seems Charles and I agree in our instincts that it is, or could
> be.  It seems clear from the confused reactions of others that our
> instincts are wrong in this case:  the idea is simple, but the
> obvious way of expressing it conveys something other than that simple
> idea to many readers who ought, if possible, to be able to read the
> XML spec with comprehension, if not always with the highest pitch of
> aesthetic pleasure.  That's an argument against using the notion in
> the documentation.

Ummm, with my preference for pop lyrics over dead poets, I will fail
to meet the aesthetic requirements of this list. I am not qualified 
to comment on the pleasures of others unless they paid their cover 
at the door and send up their requests clearly written on a tenner, 
(Feelings or Sweet Home Alabama:  $100).

>... it's not
> always clear which of the many possible explicit DTDs is the most
> useful for further work with the document in question, or other
> documents.

No, but I charge the author with that task by putting a <!doctype 
in the instance if he/she/it wishes to express that notion.

>... the resulting behavior (accept a well-formed
> document as legal) is the same in all cases.  So it's misleading
> to *define* the particular implicit DTD which a processor is supposed
> to assume.

The issue was, is an instance with an undeclared entity legal.
Is it?  Can it be?

> As an editorial, not a technical matter, the editors of the XML spec
> have no intention of appealing to the notion of an implicit DTD.
> The technical aspects of the question are not relevant, since the
> required behavior can be explained with or without such an appeal,
> and it's clear that the notion of implicit DTDs will confuse, not
> clarify, the issues for some readers.

It will confuse them because it assumes the beast is in the pen.

Still, the entity is undeclared.  That seems to be a reportable error.

Len Bullard

Follow-Ups: References: