Re: C.4 Undeclared entities?

At 10:31 AM 10/28/96 EST, lee@sq.com wrote:
>> I think you're missing Charles' point: that one goal is for XML documents
>> to *also* be SGML documents.  To do that, they *must* have at least
>> "<!DOCTYPE typename SYSTEM>" at their start.
>
>They must also have a DTD to be valid SGML documents.
>
>> Note that any *existing* SGML processor can consider this to be valid by
>> defining its algorithm for resolving the omitted SYSTEM identifier to be
>> "parse the document
>
>Please name any existing SGML processor that works this way today.
>If you have to change the code, you're not talking about an
>existing parser.

I didn't say existing processors *do* work this way, I said they *can* work
this way--in other words, there's nothing in 8879 that prescribes *how* you
get the data that makes up an entity or how you determine what the actual
system identifier is (and therefore what the data is that is addressed by
that system identifier) when you omit the system identifier.  Therefore,
*it is technically possible* to modify the entity manager of any SGML
processor to behave as I've described.

Remember too that one of the goals of XML is to define a syntax that
doesn't require *explicit* declaration of element types.  Therefore, by
definition, you never have to resolve the omitted system identifier in
order to parse the document (although you may need it for validation).

In the context of document delivery, explicit document type declarations
primarily serve the *information receiver* by letting them check to see
whether or not the data they get meets whatever requirements they have.
When a receiver has this requirement they can provide an explicit DTD (or
architecture) and if any documents don't validate against it, kick it back.
Or, they can choose to only accept documents with explicit DTDs.

Most of the time, information receivers don't care (e.g., casual Web
browsers). But when the information is really data involved in some
controlled business process, you do care.  

Note that one thing we haven't yet discussed (as far as I know) is the
concept of being able to say "I don't care what the element type
declarations for the document are, but I *do* care what architecture it
claims to conform to."  By putting the schema rules at the architecture
level, you can have a system that allows individual documents to be
ideosyncratic while still keeping a measure of schema identification and
validation.  

For example, if on your internet you want to manage "reports" but you don't
want to define some all-encompasing DTD for report documents that will meet
all the requirements of report writers (and the high maintenance cost that
implies), you can define a general "report architecture" from which any
report document must be derived.  Validation of documents *without explicit
document types* can be done *by the receiver* through a combination of
defaulting and a simple explicit mapping, done either on elements or
through some prolog (such as you can do in SGML with LINK process
declarations--not that I'd suggest actually using LINK for XML).

One problem with DTDs as defined in 8879 is *that they don't tell you
anything* about a document except how to validate it.  However,
architectures, because they represent a stand-alone set of defined
semantics and schema rules, *do* tell you something because the name of an
architecture points back to a *fixed* set of definitions (unlike the name
of a document type, which only tells you the element type of the document
element).  

Or said more simply: the idea in SGML that document types tell you
something more useful about documents than how to parse and validate them
syntactically them is a Big Lie.  One you remove from the syntax those
things for which you *must* have explicit element type declarations, you
don't need DTDs for parsing, only for validation.

Cheers,

E.
--
W. Eliot Kimber (kimber@passage.com) 
Senior SGML Consultant and HyTime Specialist
Passage Systems, Inc., (512)339-1400
10596 N. Tantau Ave., Cupertino, CA 95014-3535 (408) 366-0300, (408)
366-0320 (fax)
2608 Pinewood Terrace, Austin, TX 78757 (512) 339-1400 (fone/fax)
http://www.passage.com (work) http://www.drmacro.com (home)
"If I never had existed, would you still remember me?..."
                                   --Austin Lounge Lizards, "1984 Blues"

Received on Monday, 28 October 1996 13:45:24 UTC