Re: D.1 Distinguish partial and full DTDs?

[Summary. Disagreement with case analysis. I see two cases: DTD given, in
which case we can validate (if we can get the DTD), or not. DTD not given,
in which case validation is never an option. No need to tell application
about existence of DTDs absent some kind of access handle (agreement with
Lee, I think).]

At 5:37 PM 10/24/96, lee@sq.com wrote:
>> D.1 What behavior should XML systems exhibit if no DTD, or only a
>> partial DTD, is given with a document?
>There are four cases worth considering for partial DTDs:
>(1) the DTD covers everything in the instance
>    (e.g. it's a subset of DOCBOOK)
>    This is a full DTD as far as the instance is concerned...

   Sure. I wish we could complain about missing element declarations, but
we can't so we might as well live with it.

>(2) there are elements in the instance not in the DTD, or
>    attributes, entities, or other wilde beasties.  But
>    at least some names in the DTD correspond to the instance.
>    The DTD, in other words, is a strict subset of the one that
>    an SGML parser would need to parse and accept this instance.
>    In this case, there are two main possibilities.
>    (2a) the undefined names are errors; the instance (or DTD) is invalid
>    (2b) the undefined names are accepted, as if they were declared
>	 automatically and implicitly.
>    I would strongly favour 2b.

I would strongly favor 2a. By making the DTD optional, we let people be
lazy about declarations. But once they create declarations, they've made a
commitment to structure, and should be (at least) of violations of that
structure. A DTD is supposed to constrain documents. That's why we made it
optional, but that's also why, if present, it should have teeth. That's
also why I don't like the notion of "partial DTD" that creates a situation
where I have to do 2b, because I'm lacking _some_ declarations.

   I guess that this means that it's a validation error, as we will not be
able to prevent non-validating applications from implementing 2b.

>(3) there is a DTD, but no names in the instance are actually used
>    in the document.  This is indistinguishable from having the
>    wrong DTD, and arguably could be an error.  But in practice it
>    is the same as having no DTD, for all the good it does.

   A validation error becuase of undeclared elements (in the presence of

>(4) there is no DTD.  This is the case if a partial DTD is actually
>    empty, for example.

This is OK, because without a DTD, the notion of validation is vacuous. All
instances are legal, if they meet instance syntax restrictions.

>It seems to me that cases 2, 3 and 4 are all similar in that there are
>names in the instance that have not been explicitly declared in an
>actual manifestation of a DTD.  If case (4) is supported, cases (2) and
>(3) also should be supported, and route (2b) should be take.

I lump them differently, into two basic cases:
DTD given (and validation reports undeclared instance stuff) And partial
DTDs are malformed if the instance refers to undeclared stuff.

DTD not given: Anything goes, and validation errors are neve given (only
basic XML instance syntax errors).

>> In particular, should there
>> be a way to distinguish a document instance for which a DTD exists
>> but is not given from an instance for which no DTD exists?

>I think that's a question of philosophy.  You can only distinguish
>between where you have or can get a DTD,
>and where you don't have one and can't get one.

I agree. If the DTD exists, you should have some kind of reference to it
(that you may ignore, or that might be broken).

>If you don't have a DTD, you're in the same position as if you had
>an empty DTD.  You can read (as long as we don't screw up the
>XML definition) but not validate.

>If you work with SGML tools, you'll of course have a program to
>create an SGML DTD from an XML instance, as otherwise you couldn't
>use XML files with SGML tools.
Yep. I think this will be a useful XML tool, as well.

So I think what we really have is a two phase process:

Parse, which uses instance syntax, and produces a basic data structure.
This can always be done with no DTD information.

Validate, that takes a DTD and determines that the instance matches it.
Only possible when the DTD exists and is accessible.

Applications can't use information of the form there is a DTD, but I'm not
giving you a chance to access it, so it should not be included.

   -- David

RE delenda est.
I am not a number. I am an undefined character.
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________